Graph Neural Networks as Protein-Protein Interface Scoring Functions
ORAL
Abstract
There are numerous applications of machine learning techniques in the biological sciences, including scoring computational models of protein-protein interfaces (PPIs). Graph neural networks (GNNs) have been used to develop deep learning-based PPI scoring functions, since they can map three-dimensional protein structures onto a series of nodes and edges with no loss of information. However, after comparing the performance of current state-of-the-art PPI scoring functions on a large dataset of computational models based on high-resolution x-ray crystal structures of protein-protein heterodimers, we find that the GNN-based scoring functions, GNN-DOVE and Deeprank-GNN-ESM, are outperformed by much simpler physics- and knowledge-based functions. We propose that the lower performance of GNN-based scoring functions stems from the imbalance in the quality of the computational models within the training set. Three of the most frequently used training sets only average 1-10% near-native models per target. We generate a balanced dataset with computational models that are uniformly distributed across the ground truth score, DockQ. We use this balanced dataset to retrain both GNN-DOVE and Deeprank-GNN-ESM and determine the improvement of the scoring accuracy after retraining. In addition, we train a new GNN scoring function on the balanced dataset, using novel architecture and node feature representation to improve current state-of-the-art in PPI scoring functions
* Funding: NIH T32GM145452
–
Presenters
-
Naomi Brandt
Yale University
Authors
-
Naomi Brandt
Yale University
-
Jake Sumner
Yale University
-
Alex T Grigas
Yale University
-
Corey S O'Hern
Yale University