Machine learning approaches for classifying temporal patterns and tracing causality in RNA-Seq datasets

ORAL

Abstract

A complete understanding of cell-fate choice requires inferring the underlying gene regulatory network from detailed time-series gene expression data. Machine learning algorithms are valuable for identifying features or patterns in large datasets. We have analyzed a genome-wide RNA-Seq time series dataset from an in-vitro macrophage-neutrophil differentiation process. We performed feature extraction using non-negative matrix factorization (NMF), which is an unsupervised machine learning technique. We have implemented and demonstrated an NMF algorithm on expression levels of 36255 genes at 29 timepoints under two experimental conditions that favor differentiation into macrophages and neutrophils respectively. The algorithm provides a lower-dimensional representation of the dataset in terms of ten groups of genes (metagenes) whose expressions exhibit similar time dependences (e.g., early upregulation, late upregulation, etc.) Future work will consider the biological functions of genes within each metagene and how these correlate with macrophage/neutrophil phenotypes.

* This research was supported by the National Science Foundation under NSF EPSCoR Track-1 Cooperative Agreement OIA #1946202.

Presenters

  • Nimasha Samarawickrama

    University of North Dakota

Authors

  • Nimasha Samarawickrama

    University of North Dakota

  • Yen Lee Loh

    University of North Dakota

  • Manu Manu

    University of North Dakota

  • Andrea Repele

    University of North Dakota