Data from: SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences
Wong, Ka-Chun; Zhang, Zhaolei (2014), Data from: SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences, Dryad, Dataset, https://doi.org/10.5061/dryad.n7m28
The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance.