Abstract: Recent generations of supercomputers have adopted different strategies in their attempts to remain competitive in the race to Exascale. In most cases, they rely on accelerators such as GPUs to deliver high arithmetic performance and memory bandwidth. But accelerators come with their own challenges, often deriving from to their programming models, which can be hard for applications to take advantage of.
The current leader in the TOP500 list, the Fugaku system in Japan, has chosen a different route: instead of offloading to discrete accelerators, this system relies on a new generation of general-purpose CPUs to deliver GPU-class performance while maintaining the ease of use of a traditional CPU. Fugaku is powered by the Fujitsu A64FX, a design purpose-built for high-performance computing (HPC) based on the Arm architecture. It is able to deliver up to 1 TB/s of memory bandwidth by using the same HBM2 technology found in top-end GPUs, and it offers 512-bit-wide vectors through the Scalable Vector Extension (SVE). It is the first CPU to integrate either HBM2 or SVE.
In this talk we evaluate the performance on a range of common scientific workloads of three recent Arm-based processors: the Fujitsu A64FX, the Amazon Graviton 2, and the Ampere Altra. We use compute-bound and memory-bandwidth-bound mini-apps, and benchmarks based on widely utilised full scientific applications. These benchmarks have been successfully used in the past to quantify performance characteristics in other emerging HPC processors, such as the Arm-based Marvell ThunderX2, or previously the many-core Intel Xeon Phi. As part of this evaluation, we look not only at raw application performance, but also at the maturity of the tools available for these Arm-based processors. We compare all four major HPC compilers that can target the these platforms, including Cray, GNU, Arm and Fujitsu's own compiler.
Bio: Andrei Poenaru is a final-year PhD Student with the High Performance Computing Group at the University of Bristol. His research is centred around advanced and future architectures for HPC, and he has been involved in several studies aiming to characterise performance and evaluate portability across diverse modern architectures. His current projects are focused on vectorisation in the context of Arm SVE and upcoming Arm-based high-performance processors.