Skip to main content
Dryad

Protein structure alignments and structural similarity score code for: Benchmarking methods of protein structure alignment

Data files

May 31, 2021 version files 4.58 GB

Abstract

The function of a protein is primarily determined by its structure and amino acid sequence. Many biological questions of interest rely on being able to accurately determine the group of structures to which domains of a protein belong (often referred to as their `fold'); this can be done through alignment and comparison of protein structures. This fundamental task underpins predicting function, identification of homology, and testing hypotheses about the way "protein-space'' is organised. Dozens of different methods for Protein Structure Alignment (PSA) have been proposed that use a wide range of techniques.   The aim of this study is to determine the ability of PSA methods to identify pairs of protein domains known to share differing levels of structural similarity, and to assess their utility for clustering domains from several different folds into known groups. 

We present the results of a comprehensive investigation into eighteen PSA methods; to our knowledge this is the largest piece of independent research on this topic.   Overall, SP-AlignNS (nonsequential) was found to be the best method for classification, and also one of the best performing methods for clustering. 

Methods (where possible) were split into the algorithm used to find the optimal alignment and the score used to assess similarity.   This allowed us to largely separate the algorithm from the score it maximises and thus, to assess their effectiveness independently of each other.   Surprisingly, we found that some hybrids of mismatched scores and algorithms performed better than either of the native methods at classification and, in some cases, clustering as well. It is hoped that this investigation and the accompanying discussion will be useful for researchers selecting or designing methods to align protein structures.