Genotype to Phenotype Mapping of the E. coli lac Promoter

ORAL

Abstract

Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7\% of the variance for the full mutagenized sequence and about 15\% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data.

Authors

  • Jakub Otwinowski

    University of Pennsylvania, Biology Dept.

  • Ilya Nemenman

    Emory University, Emory University, Physics Dept., Biology Dept., Emory University, Atlanta, GA, Department of Physics and Department of Biology, Emory University