XGBoost-Based Nucleotide Identification for MoS<sub>2</sub> Nanopore DNA Sequencing: An Efficient Alternative to Deep Learning
POSTER
Abstract
Nanopore-based DNA sequencing enables real-time analysis of single molecules by measuring how nucleotides disrupt ionic currents as they pass through a nanopore. These current shifts encode valuable information but also introduce significant noise and variability, making accurate base classification challenging. Deep learning models such as Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) networks have achieved high accuracy in this task, but they require large datasets and substantial computational resources. To address this, we developed a more efficient approach using an optimized extreme gradient tree boosting (XGBoost) classifier. After cleaning the data with statistical outlier removal, we engineered new features from translocation times and current signals. Using these features, our model achieved ~96% accuracy on a small dataset, outperforming traditional classifiers and rivaling deep learning methods. These results demonstrate that gradient-boosted decision trees provide a lightweight, interpretable, and scalable solution for nucleotide classification, particularly well-suited for real-time or resource-constrained environments. Future work will focus on capturing temporal dynamics and validating the model in live nanopore sequencing workflows.
*B. O. Tayo, B. Banjara and C. E. Ekuma were funded by the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health, under Award No. 1R15GM140445-01A1.
Presenters
-
Benjamin O Tayo
- University of Central Oklahoma