Toward Intelligent Fusion Data Workflows: Harmonized Labeling with dFL and Multiscale Simulation Infrastructure via MGKDB
POSTER
Abstract
Progress in fusion energy science relies on turning voluminous, heterogeneous data into usable insight. As one pillar of the Fusion Data Platform (FDP) initiative, the Data Fusion Labeler (dFL) streamlines harmonization, preprocessing, and labeling across experimental diagnostics and multiscale simulations. To meet fusion‑specific challenges—noise, sparsity, imbalance, temporal skew, etc.---dFL offers adaptive smoothing, normalization, resampling, and schema‑aware I/O (TokSearch, CMF, IMAS/OMAS). Manual and automated interfaces, paired with built-in dimension-reduction and surrogate-model helpers, shorten the path from raw shots to AI/ML-ready datasets while preserving expert oversight.
Within the SMARTS project (Surrogate Models for Accurate and Rapid Transport Solutions), the complementary opensource Multiscale GyroKinetics DataBase (MGKDB) provides a schema-driven, metadata-rich repository for multi-resolution gyrokinetic outputs from GENE, CGYRO, TGLF, GX, GS2, and QuaLiKiz. A flexible MongoDB backend, shell and GUI clients, and IMAS hooks enable unified code metadata, provenance, benchmarking records, and evolving file formats. MGKDB thus furnishes durable infrastructure for cross-code comparison, validation, and downstream modeling, seamlessly linking simulation and experimental streams prepared by dFL.
Together, dFL and MGKDB advance a reproducible, extensible data ecosystem that accelerates scalable machine learning, multiscale physics integration, and simulation–experiment synergy for the fusion community.
Within the SMARTS project (Surrogate Models for Accurate and Rapid Transport Solutions), the complementary opensource Multiscale GyroKinetics DataBase (MGKDB) provides a schema-driven, metadata-rich repository for multi-resolution gyrokinetic outputs from GENE, CGYRO, TGLF, GX, GS2, and QuaLiKiz. A flexible MongoDB backend, shell and GUI clients, and IMAS hooks enable unified code metadata, provenance, benchmarking records, and evolving file formats. MGKDB thus furnishes durable infrastructure for cross-code comparison, validation, and downstream modeling, seamlessly linking simulation and experimental streams prepared by dFL.
Together, dFL and MGKDB advance a reproducible, extensible data ecosystem that accelerates scalable machine learning, multiscale physics integration, and simulation–experiment synergy for the fusion community.
*This work was partially supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under Award No. DE-FC02-04ER54698, along with Office of Fusion Energy Sciences Awards No. DE-SC0024426 and No. DE-SC0024399.
Presenters
-
Craig Michoski
- SapientAI LLC