An interpretable alphabet for local protein structure search based on amino acid neighborhoods

ORAL

Abstract

Recent advancements in protein structure prediction methods have vastly increased the size of databases of protein structures, necessitating fast methods for protein structure comparison. Search methods that find structurally similar proteins can be applied to find remote homologs, study the functional relationships among proteins, and aid in protein engineering tasks.

We design a “3Dn” structural alphabet that encodes the local neighborhoods around each amino acid in an interpretable way. In a search benchmark task, a combination of our alphabet and Foldseek’s 3Di alphabet, outperforms each alphabet individually and ranks best among local search methods that do not require amino acid identity information. We provide software tools that enable the exploration of novel alphabets and combinations of alphabets for protein structure search.

*This work was supported by the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard (award number #1764269) and a Burroughs-Wellcome Careers at the Scientific Interface (CASI) award.

Presenters

  • Pramesh Singh

    • Tufts University

Authors

  • Pramesh Singh

    • Tufts University
  • Saba Zerefa

    • Harvard University
  • Jesse Cool

    • Tufts University
  • Samantha Petti

    • Tufts University