Scientific Foundation Models - Toward Automated Learning of Physics Directly From Sensory Datasets in HEP

Kazuhiro Terao; Samuel Young

Scientific Foundation Models - Toward Automated Learning of Physics Directly From Sensory Datasets in HEP

Invited-In-person · Invited

Abstract

Most AI models in neutrino physics are trained on simulation with truth labels inaccessible in real data, making them vulnerable to data-simulation discrepancies. Such data shifts are particularly problematic in accelerator neutrino experiments, where limited calibration samples exist and high-resolution sensors can amplify sensitivity to systematic differences between simulation and reality. These challenges are compounded by the use of task-specific models, where separate AI models address different physics tasks with the same data (e.g., one model identifies neutrino flavor, another separates electron/gamma showers). Each task-specific model may be affected differently by data shifts, complicating systematic uncertainty assessment and potentially introducing correlated biases across the analysis pipeline.

Foundation Models (FMs) offer a potential solution through two key attributes: (1) the ability to be optimized on datasets without labels through self-supervised learning, directly using real experimental data, and (2) learning general, transferable representations that capture underlying physics patterns applicable across multiple downstream tasks. These are the same attributes that have driven success in FMs including Large Language Models (LLMs) in other domains.

In this talk, I discuss unique research challenges, opportunities, and progress toward developing FMs optimized directly on raw, sensor-level datasets from high-precision particle imaging detectors in neutrino experiments. If successful, these "sensor-level FMs" would enable direct learning of physics from experimental data and provide a unified foundation for all downstream analyse, representing a paradigm shift from current task-specific AI methods in particle physics analysis pipelines.

March 19, 2026, 10:30 AM – March 19, 2026, 11:06 AM

Publication: https://iopscience.iop.org/article/10.1088/2632-2153/ae47b8
https://arxiv.org/abs/2512.01324
https://openreview.net/forum?id=N9xLKbMXKW

Presenters

Kazuhiro Terao
- SLAC

Authors

Kazuhiro Terao
- SLAC
Samuel Young
- Stanford University / SLAC National Accelerator Laboratory