Uncovering interpretable low-dimensional geometric structures in gene expression using curvature regularized variational autoencoders
ORAL
Abstract
Biological systems inhabit low-dimensional manifolds in high-dimensional spaces. Despite the tens of billions of neurons in the human brain and the tens of thousands of genes in the human genome, the principled study of these systems in neuroscience and genomics is fruitful because biological processes rely on their coordinated organization along significantly lower-dimensional pathways. To uncover this organization, many dimensionality reduction techniques and manifold learning methods have successfully embedded high-dimensional data into low-dimensional spaces. However, these embeddings often rely on preserving similarities between data points such that the learned manifold exists in data-similarity space, making it difficult to interpret, interrogate, and generalize the manifold in the natural coordinates of neurons and genes. Here, we develop tools from variational inference---the variational autoencoder (VAE)---that learns an explicitly geometric and nonlinear manifold in these natural coordinates. By explicitly regularizing the curvature and metric of VAE manifolds, we uncover that the nonlinear geometric structure of gene-expression data organizes along key biological processes. Further, by controlling the curvature and metric of these manifolds, we demonstrate superior embedding consistency and out-of-sample generalizability. Taken together, we provide methods that capture nonlinear geometric structure in data through interpretable, interrogatable, and generalizable manifolds.
–
Presenters
-
Jason Z Kim
Cornell University
Authors
-
Jason Z Kim
Cornell University
-
Nicolas Perrin-Gilbert
Curie Institute
-
Paul Klein
Curie Institute
-
Erkan Narmanli
Curie Institute
-
Chris Myers
Cornell University
-
Itai Cohen
Cornell University
-
Joshua J Waterfall
Curie Institute
-
James P Sethna
Cornell University