Six-state amino acid recoding is not an effective strategy to offset compositional heterogeneity and saturation in phylogenetic analyses

Published Apr 29, 2021 on Dryad. https://doi.org/10.5061/dryad.5mkkwh757

Data files

Apr 29, 2021 version files 5.83 GB

Hernandez_Ryan_2021_Recoding_simulated_data.tar.gz

5.83 GB
Hernandez_Ryan_2021_supplementary_figs_v2.pdf

1.04 MB
Recoding_supplemental_commands_v4.pdf

419.12 KB
Recoding_Supplementary_Analyses_v3.pdf

131.68 KB
Recoding_supplementary_pseudocode.pdf

37.44 KB

Abstract

Six-state amino acid recoding strategies are commonly applied to combat the effects of compositional heterogeneity and substitution saturation in phylogenetic analyses. While these methods have been endorsed from a theoretical perspective, their performance has never been extensively tested. Here, we test the effectiveness of 6-state recoding approaches by comparing the performance of analyses on recoded and non-recoded datasets that have been simulated under gradients of compositional heterogeneity or saturation. In our simulation analyses, non-recoding approaches consistently outperform 6-state recoding approaches. Our results suggest that 6-state recoding strategies are not effective in the face of high saturation. Further, while recoding strategies do buffer the effects of compositional heterogeneity, the loss of information that accompanies 6-state recoding outweighs its benefits. In addition, we evaluate recoding schemes with 9, 12, 15, and 18 states and show that these consistently outperform 6-state recoding. Our analyses of other recoding schemes suggest that under conditions of very high compositional heterogeneity, it may be advantageous to apply recoding using more than 6 states, but we caution that applying any recoding should include sufficient justification. Our results have important implications for the more than 90 published papers that have incorporated 6-state recoding, many of which have significant bearing on relationships across the tree of life.

Six-state amino acid recoding is not an effective strategy to offset compositional heterogeneity and saturation in phylogenetic analyses

Data files

Abstract

Usage notes

Works referencing this dataset