Statistical Manifold Geometry of VAE Latent Spaces

Yulin Zhai; Nick Manganelli

Statistical Manifold Geometry of VAE Latent Spaces

Poster-In-person

Abstract

Machine learning tools are frequently used in particle physics to analyze the massive datasets produced by high-energy collisions, but they often overlook a key property of these data: geometry. Variational Autoencoders (VAEs) are a type of neural network that is widely used for unsupervised anomaly detection at particle colliders like the Large Hadron Collider (LHC) at CERN. VAEs compress each collision event into a point in a "latent space," where similar events cluster together. Traditionally, this latent space is assumed to be flat and Euclidean, but this assumption ignores the important information about uncertainty that the model itself learns during training. Thus, in this project, I developed a method to uncover and use the intrinsic geometry of the VAE's latent space. More specifically, I interpret each event's learned uncertainty as defining a local Riemannian metric–a way to measure distance that stretches space based on the uncertainty of the model. Then, by smoothly stitching these local metrics together, I construct a continuous statistical manifold that reflects the global structure of the real collision data. Finally, I utilize this new "intrinsic prior" distribution that is uniform with respect to the manifold's volume and use Hamiltonian Monte Carlo (HMC), a gradient-based sampling method, to generate new points that stay on the learned manifold. The goal of this approach is to produce more realistic synthetic collision events and reveal how the model organizes known physics processes, potentially improving searches for new physics.

March 15, 2026, 6:00 AM – March 15, 2026, 6:00 AM

Presenters

Yulin Zhai
- Claremont McKenna College

Authors

Yulin Zhai
- Claremont McKenna College
Nick Manganelli
- Northeastern University