A machine-learning checkpoint/restart algorithm for particle-in-cell simulations.

Luis Chacon; Guangye Chen

A machine-learning checkpoint/restart algorithm for particle-in-cell simulations.

POSTER

Abstract

With ever-increasing computing power and memory capacity, particle check-pointing for fault recovery of particle-in-cell simulations is stressing I/O subsystems, and becoming prohibitive. Given that future exascale computers are expected to be significantly more vulnerable to hard faults than current HPC systems, the availability of a fast and accurate recovery strategy is absolutely essential. In this study, we consider compression of the particle distribution function (PDF) by unsupervised machine-learning techniques.\footnote{G. Chen and L. Chac\'on, ``A machine-learning checkpoint/restart algorithm for particle-in-cell simulations'', in preparation} Specifically, we approximate the PDF with a Gaussian mixture.\footnote{Geoffrey McLachlan and David Peel. Finite Mixture Models. John Wiley \& Sons, 2004.} The Gaussian mixture is found by employing maximum likelihood principle with an information criterion, the minimum-message-length principle, for determining an optimal density estimation of the PDF.$^2$ Restart is conducted by moment-matching sampling of the Gaussian mixture, which strictly conserves charge/mass, momentum, and energy. We demonstrate the effectiveness of the method with various electrostatic and electromagnetic particle-in-cell simulations in 1D and 2D.

^*This work was performed under the auspices of the National Nuclear Security Administration of the U.S. Department of Energy at Los Alamos National Laboratory, managed by Triad National Security, LLC under contract 89233218CNA000001.

Authors

Luis Chacon
- Los Alamos National Laboratory
- LANL
Guangye Chen
- Los Alamos National Laboratory