Effects of ascertainment and library size on epistasis in fitness landscapes

ORAL

Abstract

High-throughput methods, wherein the phenotypes of 2^K genotypes can be measured for an increasing number of mutations, have provided a powerful framework for quantifying epistasis. Mutations used to construct empirical libraries may be chosen because they occurred along an adaptive trajectory between ancestral and evolved sequences, because they are beneficial in a particular genetic background, or at random. Conclusions drawn from these sub-landscapes can be biased by the ascertainment scheme as well as by the size of the library. We develop a theoretical framework to quantify the effects of these choices on summary statistics of empirical fitness landscapes, including the variance explained by regression models of increasing interaction order, the number of local maxima, and the number of accessible evolutionary paths. We show that adaptive and beneficial sampling inflate the fraction of variance attributed to lower-order epistatic interactions relative to random libraries, with the magnitude of the effect in beneficial libraries depending on the underlying landscape ruggedness. In addition, restricting measurements to a small combinatorial subset of the full genotype space effectively projects the high-dimensional landscape onto a lower-dimensional subspace, which itself tends to elevate apparent lower-order epistasis. These results demonstrate that inferred epistasis reflects both the underlying landscape structure and experimental design, emphasizing the importance of accounting for ascertainment when interpreting empirical fitness landscapes.

*NSF GRFP 

Presenters

  • Caelan Brooks

    • Harvard University

Authors

  • Caelan Brooks

    • Harvard University
  • Maryn Carlson

    • University of Chicago
  • Michael Desai

    • Harvard University