Skip to main content
Dryad

The Dayhoff Exchange Score: A new metric to quantify site saturation in amino acid datasets prior to phylogenetic analysis

Data files

Aug 12, 2024 version files 8.82 GB

Abstract

Site saturation is a persistent problem in phylogenetic analyses, where it can hinder the accuracy of topology reconstruction. It is fundamentally caused by large amounts of independent change along branches, causing the model to be unable to distinguish phylogenetic signal from noise. The Dayhoff Exchange Score (DE-score) is a new metric to assess site saturation within and between amino acid datasets, which provides both a whole dataset overview and taxon-specific values that represent the contribution of a given taxon to the whole dataset saturation. We first assess the efficacy of this score at detecting increased site saturation over 20,000 simulation datasets, compare it to the existing Slope R2 score and then assess its efficacy in the face of the potentially confounding factors of increasing taxon number, number of positions in the alignment, missing data and noise. Finally, we use the DE-Score to re-evaluate a previously published dataset by Kocot et al (2017), to illustrate its efficacy.

The DE-Score is available at:

https://github.com/JFFleming/DEScore