Looking into the black box: probing internal activations in a data-driven weather model reveals interpretable physical features

Theodore MacMillan; Nicholas Ouellette

Looking into the black box: probing internal activations in a data-driven weather model reveals interpretable physical features

Oral-In-person

Abstract

Large data-driven physics models like Deepmind’s GraphCast have empirically succeeded in parameterizing time operators for complex dynamical systems at an accuracy reaching or in some cases exceeding that of classical physics-based solvers. Unfortunately, how these data-driven models perform computations is largely unknown and whether their internal representations correspond to something interpretable or physically plausible is an open question. In this work, we combine tools from interpretability research in Large Language Models and sparsity-promoting methods in dynamical systems to analyze intermediate computational layers in the weather model GraphCast, in particular leveraging sparse autoencoders and dictionary learning to discover preferred directions (features) in the neuron space of the model. We are able to uncover distinct features on a wide range of length and time scales, including features corresponding to tropical cyclones, atmospheric rivers, diurnal behavior, large-scale precipitation patterns, and specific geographical coding, among others. We further demonstrate how precise interventions on these internal features lead to sparse and interpretable model outputs, opening the possibility for explaining model predictions in a human understandable manner or even uncovering unknown physical mechanisms with causal equivalences. As a case study, we sparsely modify internal features in GraphCast to alter the strength of evolving hurricanes.

March 19, 2026, 4:54 PM – March 19, 2026, 5:06 PM

Presenters

Theodore MacMillan
- Stanford University

Authors

Theodore MacMillan
- Stanford University
Nicholas Ouellette
- Stanford University