Refactoring GENE for improved parallel scalability on current and upcoming supercomputers
POSTER
Abstract
GENE is one of the constituent codes of the WDMApp (Whole Device Modeling Application) ECP project, designated to simulate gyrokinetic microturbulence in the core of a fusion device.
Legacy GENE uses one global domain decomposition for the 6-d arrays representing distribution functions (3d configuration space, 2d velocity, 1 dim for species). The actual decomposition amongst MPI processes is determined in an auto-tuning phase. This works very well for the calculation of the main r.h.s. terms of the gyrokinetic equations -- however, these terms also involve lower-dimensional quantities like electrostatic potential and parallel vector potential that are calculated in a sequence of dimensionality-lowering
operations like integration over v-parallel, gyro-averaging, solving of the 3-d field equations etc. The domain decomposition for the 6-d domain is not necessarily optimal for these steps, so we propose a refactoring of GENE that allows individual phases of the integrator to be performed on different decompositions. This does require remapping between those decompositions, so care needs to be taken to take into account those costs.
This approach does come with further advantages -- initialization of a component can be performed on a different decomposition than the actual application of that operation in the solver, which can improve performance and avoid recomputation, especially in parts of the code where the solver is running on the GPU but initialization is not. In addition, it
allows us to have more self-contained components that can be used to investigate various options for tight and loose coupling in the context of the core-edge coupled WDMapp.
Legacy GENE uses one global domain decomposition for the 6-d arrays representing distribution functions (3d configuration space, 2d velocity, 1 dim for species). The actual decomposition amongst MPI processes is determined in an auto-tuning phase. This works very well for the calculation of the main r.h.s. terms of the gyrokinetic equations -- however, these terms also involve lower-dimensional quantities like electrostatic potential and parallel vector potential that are calculated in a sequence of dimensionality-lowering
operations like integration over v-parallel, gyro-averaging, solving of the 3-d field equations etc. The domain decomposition for the 6-d domain is not necessarily optimal for these steps, so we propose a refactoring of GENE that allows individual phases of the integrator to be performed on different decompositions. This does require remapping between those decompositions, so care needs to be taken to take into account those costs.
This approach does come with further advantages -- initialization of a component can be performed on a different decomposition than the actual application of that operation in the solver, which can improve performance and avoid recomputation, especially in parts of the code where the solver is running on the GPU but initialization is not. In addition, it
allows us to have more self-contained components that can be used to investigate various options for tight and loose coupling in the context of the core-edge coupled WDMapp.
*This work is supported by the US Department of Energy as part of the "Whole Device Modeling Application" ECP project.
Presenters
-
Kai Germaschewski
- University of New Hampshire