Extracting Gene Expression Patterns from High-Dimensional Transcriptomic Data

ORAL

Abstract

Whole-embryo transcriptomic analysis can reveal coordinated gene expression programs that underlie morphogenetic development. A central challenge in embryo-scale transcriptomics is to distinguish biologically meaningful heterogeneity from technical noise inherent to high-dimensional datasets, while simultaneously recovering the global topological structure that organizes cellular diversity. Here, we present a method for extracting low-dimensional representations of large-scale gene expression datasets using: (i) an initial dimensionality reduction step to isolate biological signal from experimental noise; and (ii) an autoencoder-based machine learning framework that learns a low-dimensional parameterization of the data manifold, with an implicit bias toward preserving global neighborhood relationships. We validate our approach in Drosophila melanogaster by comparing inferred marker-gene expression patterns to in situ hybridization data and demonstrate the applicability of our method to higher-resolution, single-cell transcriptomic datasets.

*National Science Foundation, Grant PHY-2210612

Presenters

  • Jeremy Lauro

    • University of California, Santa Barbara

Authors

  • Jeremy Lauro

    • University of California, Santa Barbara
  • Boris I Shraiman

    • University of California, Santa Barbara
  • Nicholas Noll

    • Google Quantum AI