Toward Petaflop First Principles Kinetic Plasma Simulation
COFFEE_KLATCH · Invited
Abstract
Due to physical limitations (such as the speed of light), moving data between and even within modern microprocessors is more time consuming than performing computations. As a result, individual processor core performance is stagnant, multicore processors are ubiquitous and traditional programming styles are unable to exploit the potential of modern computers fully. This talk will discuss the architecture and implementation of the 3d electromagnetic relativistic particle-in-cell code VPIC for LANL's Roadrunner supercomputer. Roadrunner is expected to have 13,000 IBM Cell microprocessors (each Cell contains a dual threaded Power core and 8 specialized vector cores) and be capable of over a petaflop ($10^{15}$ floating point operations per second). VPIC minimizes data movement and allows vector extensions of modern processors to be utilized portably. This made it possible to port VPIC quickly while achieving unprecedented performance. The initial port performed 0.13 billion particles pushed and accumulated per second per Cell---equivalent to 1.0 billion per second per 8 Cell node or sustaining Roadrunner at 0.4 petaflop. Higher performance is likely as the port is refined. Regardless, already demonstrated performance will enable previously intractable simulations in numerous areas of plasma physics, including magnetic reconnection and laser plasma interactions.
*LA-UR-07-4830. Thanks to Brian Albright, Ben Bergen, Lin Yin and Thomas Kwan. This work was performed under the auspices of the US Dept of Energy by the Los Alamos National Security LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396.
–