Graph Neural Networks as Protein-Protein Interface Scoring Functions

Naomi Brandt; Jake Sumner; Alex T Grigas; Corey S O'Hern

Graph Neural Networks as Protein-Protein Interface Scoring Functions

ORAL

Abstract

There are numerous applications of machine learning techniques in the biological sciences, including scoring computational models of protein-protein interfaces (PPIs). Graph neural networks (GNNs) have been used to develop deep learning-based PPI scoring functions, since they can map three-dimensional protein structures onto a series of nodes and edges with no loss of information. However, after comparing the performance of current state-of-the-art PPI scoring functions on a large dataset of computational models based on high-resolution x-ray crystal structures of protein-protein heterodimers, we find that the GNN-based scoring functions, GNN-DOVE and Deeprank-GNN-ESM, are outperformed by much simpler physics- and knowledge-based functions. We propose that the lower performance of GNN-based scoring functions stems from the imbalance in the quality of the computational models within the training set. Three of the most frequently used training sets only average 1-10% near-native models per target. We generate a balanced dataset with computational models that are uniformly distributed across the ground truth score, DockQ. We use this balanced dataset to retrain both GNN-DOVE and Deeprank-GNN-ESM and determine the improvement of the scoring accuracy after retraining. In addition, we train a new GNN scoring function on the balanced dataset, using novel architecture and node feature representation to improve current state-of-the-art in PPI scoring functions

^* Funding: NIH T32GM145452

March 7, 2024, 4:36 PM – March 7, 2024, 4:48 PM

Presenters

Naomi Brandt

Yale University

Authors

Naomi Brandt

Yale University
Jake Sumner

Yale University
Alex T Grigas

Yale University
Corey S O'Hern

Yale University