Analytic Theories of Creativity and Generalization in Diffusion Models

ORAL  · Invited

Abstract

Diffusion models seem to have the capacity to generalize in high dimensions: that is, they are capable of producing novel samples that are far from their training set (as all novel samples in high dimensions must be) by mixing and matching attributes from different training images, yet still adhering to the implicit rules and constraints of the dataset such that these combinations remain sensible. Sometimes adherence to these rules fails, however, such as when models generate samples with spatial inconsistencies, such as garbled text or the wrong number of digits in a finger. Strikingly, for any fixed training set, there is an optimal solution to the diffusion model objective in function space; however, this optimal solution, known as the "empirical score function," always reproduces memorized samples, inconsistent with the phenomena of interest. We find, however, that introducing two simple regularizing constraints, 1) equivariance and 2) locality, yields another exactly solvable minimum in function space, which a) is manifestly generalizing, producing samples that mix and match from patches in the training set while retaining local consistency, and b) is strikingly correlated with the behavior of both small convolutional diffusion models with architectural biases towards locality, as well as more powerful diffusion models early in training. This is to our knowledge the first time such generations have been robustly predicted with such accuracy from first principles. This theory also reproduces and predicts spatial consistency issues as a direct consequence of excessive spatial locality. I will then discuss the statistical and information-theoretic implications of this model, which yields both novel insight into the origins of diffusion model universality, as well as generalizations that help bridge the gap to the phenomenology of more complex and powerful diffusion models.

*M.K. acknowledges support from the NSF Graduate Research Program Fellowship.

Publication: Kamb, M., & Ganguli, S. An analytic theory of creativity in convolutional diffusion models. In Forty-second International Conference on Machine Learning.

Presenters

  • Mason Kamb

    • Stanford University

Authors

  • Mason Kamb

    • Stanford University
  • Surya Ganguli

    • Stanford University