Graph Neural Networks as Protein-Protein Interface Scoring Functions

ORAL

Abstract

There are numerous applications of machine learning techniques in the biological sciences, including scoring computational models of protein-protein interfaces (PPIs). Graph neural networks (GNNs) have been used to develop deep learning-based PPI scoring functions, since they can map three-dimensional protein structures onto a series of nodes and edges with no loss of information. However, after comparing the performance of current state-of-the-art PPI scoring functions on a large dataset of computational models based on high-resolution x-ray crystal structures of protein-protein heterodimers, we find that the GNN-based scoring functions, GNN-DOVE and Deeprank-GNN-ESM, are outperformed by much simpler physics- and knowledge-based functions. We propose that the lower performance of GNN-based scoring functions stems from the imbalance in the quality of the computational models within the training set. Three of the most frequently used training sets only average 1-10% near-native models per target. We generate a balanced dataset with computational models that are uniformly distributed across the ground truth score, DockQ. We use this balanced dataset to retrain both GNN-DOVE and Deeprank-GNN-ESM and determine the improvement of the scoring accuracy after retraining. In addition, we train a new GNN scoring function on the balanced dataset, using novel architecture and node feature representation to improve current state-of-the-art in PPI scoring functions

* Funding: NIH T32GM145452

Presenters

  • Naomi Brandt

    Yale University

Authors

  • Naomi Brandt

    Yale University

  • Jake Sumner

    Yale University

  • Alex T Grigas

    Yale University

  • Corey S O'Hern

    Yale University