Predicting characterization of microbiome taxonomy from imaging using machine learning approaches
ORAL
Abstract
Taxonomic diversity is a useful metric for describing microbial communities and can be used as a measure of ecosystems’ health, resilience, and biological interactions. Currently, microbial diversity can be determined either using traditional staining methods that are limited to pure cultures or using sequencing methods that require high investment in cost, time, and expertise. In this study, we demonstrate an innovative method that employs microscopy images of bacterial communities and machine learning to predict taxonomic diversity and the dominant bacterial classes of bacterial communities.
47 mock human skin microbiome communities were created using microorganisms collected from human donors and grown in vitro for 8-32 days. Ten mL of each sample was used to determine the taxonomy of the community, using metatranscriptomics and Kraken2 to provide population-level taxonomic information; five mL of each sample was used for imaging. The resulting micrographs served as the basis for establishing a new analysis pipeline that sequentially used two different methods for machine learning and one statistical technique: (1) confocal microscopy images were segmented into individual cells using the deep learning, publicly available machine learning model Cellpose; (2) continuous probability density functions describing the joint distribution of the cell area and eccentricity were found via the statistical technique of kernel density estimation; (3) these probability density functions were used as input for training convolutional neural networks to predict both the taxonomic diversity and the most common bacterial class, independently of metatranscriptomics.
Pre-print available at https://doi.org/10.1101/2025.02.03.636311
47 mock human skin microbiome communities were created using microorganisms collected from human donors and grown in vitro for 8-32 days. Ten mL of each sample was used to determine the taxonomy of the community, using metatranscriptomics and Kraken2 to provide population-level taxonomic information; five mL of each sample was used for imaging. The resulting micrographs served as the basis for establishing a new analysis pipeline that sequentially used two different methods for machine learning and one statistical technique: (1) confocal microscopy images were segmented into individual cells using the deep learning, publicly available machine learning model Cellpose; (2) continuous probability density functions describing the joint distribution of the cell area and eccentricity were found via the statistical technique of kernel density estimation; (3) these probability density functions were used as input for training convolutional neural networks to predict both the taxonomic diversity and the most common bacterial class, independently of metatranscriptomics.
Pre-print available at https://doi.org/10.1101/2025.02.03.636311
*Air Force Office of Scientific Research (Grant FA9550-20-1-0131), Welch Foundation (Grant F-1756), Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via Contract No. W911NF-22-C-0051. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, ARO, or the U.S. Government.
–
Presenters
-
Vernita Gordon
- University of Texas at Austin