Efficient GPU parallelization of first-principles electron-phonon calculations

ORAL

Abstract

Developing scalable software that leverages exascale computing is critically important for first-principles calculations. However, for widely employed workflows focusing on electron-phonon interactions and related transport and nonequilibrium dynamics, taking advantage of GPU hardware remains challenging. In this talk, we show an efficient GPU parallel implementation of electron-phonon algorithms employing data structures and code optimized for GPUs. We target both transport and nonequilibrium dynamics calculations in the Boltzmann equation formalism, and achieve a significant performance improvement with a range of strategies, including grouping and sorting contributions from different scattering processes and rewiring key data structures. Benchmark tests for several materials on one GPU node with four NVIDIA A100 GPUs (40GB) demonstrate a remarkable 40x speedup over the original CPU-based implementation on one AMD EPYC 7763 processor with 64 cores. Additionally, the new implementation exhibits nearly ideal strong scaling up to 32 GPUs, with only a slight decrease in performance up to 64 GPUs, while requiring only a small memory overhead. The talk will also discuss details of the OpenACC implementation in the Perturbo code, as well as building the code on advanced supercomputers using the nvfortran compiler.

*The National Science Foundation under Grant No. OAC-2209262

Presenters

  • Shiyu Peng

    • Caltech

Authors

  • Shiyu Peng

    • Caltech
  • Donnie Pinkston

    • Caltech
  • Jia Yao

    • Caltech
  • Sergei Kliavinek

    • Caltech
  • Ivan Maliyov

    • EPFL
    • CNRS
    • Aix-Marseille Universite
    • Caltech
  • Marco Bernardi

    • Caltech