Improving the Performance and Portability of VPIC
POSTER
Abstract
VPIC is a Particle-in-Cell (PIC) code which is able to deliver novel science at unprecedened scale. Most notably, this includes the attainment of petaflop performance during the simulation of trillions of particles. Such high levels of performance have historically been achieved though the use of vectorization, and explicit compiler intrinsics in VPIC. An approach which can offer good performance, at the cost of the code needing to be re-written for each new vector-width or hardware architecture. In this work we present an investigation of how modern coding techniques and auto-vectorization can be used to enable VPIC to automatically scale to new vector-widths and hardware platforms, using a single codebase which no longer needs to be manually adapted for new vector-widths. To achieve this, we express the core PIC algorithm in such a way that the compiler is able to generate auto-vectorized instructions, including an adaptation of the algorithm so that it no longer contains data dependencies. We also present a performance study for this new code variant, showing that it is able to achieve vector performance comparable to that of historic hand coded intrinsics on both Intel Knights Landing and traditional Intel Xeon platforms.
*Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.