Particle Jet Representations via a Joint Embedding Predictive Architecture

Zihan Zhao; Haoyang Li; Subash Katel; Raghav Kansal; Farouk Mokhtar; Javier M Duarte

Particle Jet Representations via a Joint Embedding Predictive Architecture

ORAL

Abstract

In high energy physics, self-supervised learning methods have the potential to aid in the creation of machine learning models without the need for labeled datasets for a variety of tasks, including those related to jets---narrow sprays of particles produced by quarks and gluons in high energy particle collisions. This study introduces an approach to learning augmentation-independent jet representations using a Jet-based Joint Embedding Predictive Architecture (J-JEPA). This approach aims to predict various physical targets from an informative context, using target positions as joint information. As an augmentation-free method, J-JEPA avoids introducing biases that could harm downstream tasks, which often require invariance under augmentations different from those used in pretraining. This augmentation-independent training enables versatile applications, offering a pathway toward a cross-task foundation model. We fine-tuned the representations learned by J-JEPA for jet tagging and benchmarked them against task-specific representations.

^*This work was supported by the Research Corporation for Science Advancement (RCSA) under grant \#CS-CSA-2023-109, Alfred P. Sloan Foundation under grant \#FG-2023-20452, U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics Early Career Research Program under Award No. DE-SC0021187, the DOE, Office of Advanced Scientific Computing Research under Award No. DE-SC0021396 (FAIR4HEP), and the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for Accelerating AI Algorithms for Data-Driven Discovery (A3D3) under Cooperative Agreement OAC-2117997.This work was performed using the Pacific Research Platform Nautilus HyperCluster supported by NSF awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, the University of California Office of the President, and the University of California San Diego's California Institute for Telecommunications and Information Technology/Qualcomm Institute. Thanks to CENIC for the 100 Gpbs networks.

March 19, 2025, 10:57 AM – March 19, 2025, 11:09 AM

Publication: Accepted by the Machine Learning and the Physical Sciences workshop at NeurIPS 2024.

Presenters

Zihan Zhao
- University of California, San Diego

Authors

Zihan Zhao
- University of California, San Diego
Haoyang Li
- University of California, San Diego
Subash Katel
- University of California, San Diego
Raghav Kansal
- California Institute of Technology
Farouk Mokhtar
- University of California, San Diego
Javier M Duarte
- University of California, San Diego