Using machine learning to interpret ground-level PM2.5 distribution in east Asia in 2004-2013

ORAL

Abstract

Airborne particulate matter has substantial health and climatic impacts. In this study we use machine learning to aid the analysis of a comprehensive global particulate dataset. Our focus is on an average annual cycle at a 10 km spatial resolution and a 10-day temporal resolution covering the period 2004-2013 (i.e. each grid point has an annual cycle described by thirty-six 10 day averages). We then use an unsupervised classification (self-organizing map) to objectively characterize the shape of the annual cycles into 100 classes. Locations in a given class have annual cycles of a very similar shape. The different classes are depicted geographically using different colors on a map. The self-organizing map is able to clearly separate urban and rural areas in Sichuan, China. In order to find the precise relationship between the shapes of these annual cycles with the meteorological context, we use random forests to rank the top 20 most important variables in determining the shape of the PM$_{2.5}$ annual cycle. The machine learning is a useful assistant in giving the data a voice.

Authors

  • Daji Wu

    The University of Texas at Dallas