Implementing particle and ray-based algorithms in heterogeneous environments for plasma simulations
POSTER
Abstract
The proliferation of CPU-GPU heterogeneous comput-
ing presents an opportunity for the acceleration of high-fidelity plasma
physics simulations. The TriForce 1 code’s ray-tracing package is be-
ing developed to leverage this architecture and uses a parallel imple-
mentation of a recently developed fixed-point iteration algorithm for
including the effect of crossed-beam energy transfer (CBET) 2 . A hy-
brid object-oriented and grid-based dereferencing scheme allows for si-
multaneous tracking of ray and grid parameters while preserving data
locality and allowing low-synchronization SIMD iteration kernels. Intro-
ducing OpenMP vectorization resulted in a 10× speedup on a 12-thread
processor and the addition of GPU computation increased this further.
Furthermore, implementations of particle-based algorithms are in de-
velopment for multi-node distributed memory GPU clusters to enable
large-scale plasma PIC simulations on high-performance computing sys-
tems. Previous efforts have been limited to either shared memory GPUs
or distributed memory CPU-only packages. We present progress on us-
ing both approaches in concert to create a scalable and heterogeneous
platform for particle- and ray-based scientific computing. References:
1 [A. B. Sefkow, et al., Bulletin of the APS 64, JP10.125 (2019)], 2 [R. K.
Follett, et al, Phys. Rev. E 98, 043202 (2018)].
ing presents an opportunity for the acceleration of high-fidelity plasma
physics simulations. The TriForce 1 code’s ray-tracing package is be-
ing developed to leverage this architecture and uses a parallel imple-
mentation of a recently developed fixed-point iteration algorithm for
including the effect of crossed-beam energy transfer (CBET) 2 . A hy-
brid object-oriented and grid-based dereferencing scheme allows for si-
multaneous tracking of ray and grid parameters while preserving data
locality and allowing low-synchronization SIMD iteration kernels. Intro-
ducing OpenMP vectorization resulted in a 10× speedup on a 12-thread
processor and the addition of GPU computation increased this further.
Furthermore, implementations of particle-based algorithms are in de-
velopment for multi-node distributed memory GPU clusters to enable
large-scale plasma PIC simulations on high-performance computing sys-
tems. Previous efforts have been limited to either shared memory GPUs
or distributed memory CPU-only packages. We present progress on us-
ing both approaches in concert to create a scalable and heterogeneous
platform for particle- and ray-based scientific computing. References:
1 [A. B. Sefkow, et al., Bulletin of the APS 64, JP10.125 (2019)], 2 [R. K.
Follett, et al, Phys. Rev. E 98, 043202 (2018)].
*Funding provided by DOE OFES
Presenters
-
Matthew Burns
- University of Rochester
- University of Rochester Departments of Mechanical Engineering, Physics, and Computer Science