DFT-Based Electronic Structure Calculations on Hybrid and Massively Parallel Computer Architectures
ORAL
Abstract
The latest generation of supercomputers is capable of multi-petaflop peak performance, achieved by using thousands of multi-core CPU's and often coupled with thousands of GPU's. However, efficient utilization of this computing power for electronic structure calculations presents significant challenges. We describe adaptations of the Real-Space Multigrid (RMG) code that enable it to scale well to thousands of nodes. A hybrid technique that uses one MPI process per node, rather than on per core was adopted with OpenMP and POSIX threads used for intra-node parallelization. This reduces the number of MPI process's by an order of magnitude or more and improves individual node memory utilization. GPU accelerators are also becoming common and are capable of extremely high performance for vector workloads. However, they typically have much lower scalar performance than CPU's, so achieving good performance requires that the workload is carefully partitioned and data transfer between CPU and GPU is optimized. We have used a hybrid approach utilizing MPI/OpenMP/POSIX threads and GPU accelerators to reach excellent scaling to over 100,000 cores on a Cray XE6 platform as well as a factor of three performance improvement when using a Cray XK7 system with CPU-GPU nodes.
–
Authors
-
Emil Briggs
Department of Physics, North Carolina State University
-
Miroslav Hodak
North Carolina State University, Department of Physics, North Carolina State University
-
Wenchang Lu
Department of Physics, North Carolina State University
-
Jerzy Bernholc
North Carolina State University, Department of Physics, North Carolina State University, North Carolina State Univ