Scientific Foundation Models - Toward Automated Learning of Physics Directly From Sensory Datasets in HEP

ORAL  · Invited

Abstract

Most AI models in neutrino physics are trained on simulation with truth labels inaccessible in real data, making them vulnerable to data-simulation discrepancies. Such data shifts are particularly problematic in accelerator neutrino experiments, where limited calibration samples exist and high-resolution sensors can amplify sensitivity to systematic differences between simulation and reality. These challenges are compounded by the use of task-specific models, where separate AI models address different physics tasks with the same data (e.g., one model identifies neutrino flavor, another separates electron/gamma showers). Each task-specific model may be affected differently by data shifts, complicating systematic uncertainty assessment and potentially introducing correlated biases across the analysis pipeline.

Foundation Models (FMs) offer a potential solution through two key attributes: (1) the ability to be optimized on datasets without labels through self-supervised learning, directly using real experimental data, and (2) learning general, transferable representations that capture underlying physics patterns applicable across multiple downstream tasks. These are the same attributes that have driven success in FMs including Large Language Models (LLMs) in other domains.

In this talk, I discuss unique research challenges, opportunities, and progress toward developing FMs optimized directly on raw, sensor-level datasets from high-precision particle imaging detectors in neutrino experiments. If successful, these "sensor-level FMs" would enable direct learning of physics from experimental data and provide a unified foundation for all downstream analyse, representing a paradigm shift from current task-specific AI methods in particle physics analysis pipelines.

*This work is supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics under Contract DE-AC02-76SF00515.

Publication: https://iopscience.iop.org/article/10.1088/2632-2153/ae47b8
https://arxiv.org/abs/2512.01324
https://openreview.net/forum?id=N9xLKbMXKW

Presenters

  • Kazuhiro Terao

    • SLAC

Authors

  • Kazuhiro Terao

    • SLAC
  • Samuel Young

    • Stanford University / SLAC National Accelerator Laboratory