Learning biophysical energy functions from protein structure data with physically-informed equivariant neural networks

ORAL

Abstract

Understanding protein structure and function is crucial both for our understanding of biology and for the development of a host of medical and non-medical technologies. To this end, machine learning (ML) has been a driving force. However, data that links protein structure with function is relatively sparse, which makes training robust models difficult, and many current ML approaches can only work with a simplified representation of proteins. Here we show that a rotationally symmetric neural network trained on protein structure data, which has demonstrated the ability to learn an effective biophysical potential from data, can be used to modify input coordinates of a protein structure to reflect new desired outputs. Specifically, when combined with our energy model, the input points can be optimized with respect to the network output, providing a means to relax atomic environments (e.g. to accommodate for novel mutations in a protein). Using a gradient based optimization scheme, we can reconstruct 3D coordinate sets ranging from 30-100 atoms with an accuracy of 1-2 angstrom RMSD. Such a differentiable, invertible energy-based model would provide a strong foundation to learn a biophysical energy function for proteins, which could be used to study open problems in protein research like predicting mutational effects and binding affinity.

* This work has been supported by the National Institutes of Health MIRA award (R35 GM142795), and the CAREER award from the National Science Foundation (grant No: 2045054)

Presenters

  • Kevin A Borisiak

    University of Washington

Authors

  • Kevin A Borisiak

    University of Washington

  • Armita Nourmohammad

    University of Washington

  • Michael N Pun

    University of Washington

  • Gian Marco Visani

    University of Washington