On the learnability of probability distributions

ORAL

Abstract

Organisms and algorithms learn stimulus distributions from previous observations, either over evolutionary time or on the fly. If discrete events are what is observed, then the probability distribution is a list of numbers, one for each possible outcome; in the absence of regularities, estimating the underlying distribution from data requires an organism to observe each possible outcome many times. The number of possible outcomes is astronomical, even in small systems. In a binary image with just N = 100 pixels, the number of possible images (2N~ 1030) is larger than the age of the universe in seconds. Learning requires discovery of regularities in the data, and exploitation of these regularities by formulating simplified models. Here we show that two conditions support efficient learning. First, the mutual information between two partitions of the system should be sub-extensive. Second, this shared information should be compressible, so that it can be represented by a number of bits equal to the information rather than the sub-systems' entropy. We argue that, under these conditions, the distribution can be learned with a number of parameters that grows linearly with N.

Presenters

  • David Schwab

    Institute for Theoretical Science, Graduate Center at the City University of New York, The Graduate Center, CUNY, Department of Biology, City University of New York

Authors

  • William Bialek

    Princeton University, Physics, Princeton University, Department of Physics, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton Univ, Princeton University and The Graduate Center, CUNY

  • Stephanie Palmer

    Physics, Univ of Chicago

  • David Schwab

    Institute for Theoretical Science, Graduate Center at the City University of New York, The Graduate Center, CUNY, Department of Biology, City University of New York