Scientific Data Services Framework for Plasma Physics

ORAL

Abstract

Plasma physics experiment and simulations are producing petabytes of data. Hundreds of diagnostic tools are being used with thousands of different analysis tasks on these datasets to generate scientific insight. Often I/O operations are the bottleneck in these analysis operations. This work address the I/O efficiency issue by developing techniques for common data access patterns, for deep storage hierarchies, and for massive parallelism.

Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, and select the best array partitioning strategy for a given operation. In a series of performance tests on large scientific datasets, we have observed that our framework outperforms Spark by as much as 2070X on the same tasks.

*This effort was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research under contract number DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility.

Presenters

  • Kesheng Wu

    • Lawrence Berkeley National Laboratory

Authors

  • Kesheng Wu

    • Lawrence Berkeley National Laboratory
  • Bin Dong

    • Lawrence Berkeley National Laboratory
  • Surendra Byna

    • Lawrence Berkeley National Laboratory