Modeling intrinsic biases in high-throughput sequencing data for chromatin accessibility

ORAL

Abstract

Genome-wide profiling of chromatin accessibility with the assay for transposase-accessible chromatin using sequencing (ATAC-seq) or DNaseI hypersensitivity sequencing (DNase-seq) has been widely used for studying regulatory DNA elements and transcriptional regulation in many cellular systems. Efficient and thorough computational analysis is essential for extracting biological information from such high-throughput sequencing data. It has been reported that DNase cleavage of DNA has sequence preferences that can significantly affect the footprint patterns at transcription factor binding sites in genomic profiles. We found that enzymatic sequence biases commonly exist in both bulk and single-cell chromatin accessibility profiling data. Using a regular simplex encoding model, we developed a quantitative approach for accurate characterization and systematic correction of intrinsic sequence biases contained in ATAC-seq and DNase-seq data. This approach can be applied in bioinformatics for improved analysis of high-throughput chromatin accessibility sequencing.

Presenters

  • Chongzhi Zang

    University of Virginia

Authors

  • Shengen Hu

    University of Virginia

  • Chongzhi Zang

    University of Virginia