Analytical corrections to entropy for under-sampled discrete distributions.

ORAL

Abstract

Estimating entropy of probability distributions of various data sets is a common question in modern data analysis. A common problem is that the number of independent samples obtained in experiments is limited, so that many states are under-sampled and naïve entropy estimators are inaccurate. Previous studies found that the statistics of states that occur in data sets multiple times (coincidences) provide useful corrections to entropy estimates in the extremely under-sampled regime. These corrections are largely numerical in nature and so provide little insight to which features of the dataset cause them. Here, we present analytical approximations to a coincidence-based entropy estimators, which shed some light on this question.

Presenters

  • Ahmed Roman

    Physics, Emory University

Authors

  • Damian Hernandez

    Physics, Emory University, Centro Atómico Bariloche

  • Ahmed Roman

    Physics, Emory University

  • Ilya Nemenman

    Emory University, Physics, Emory University