The implications of incongruence between gene tree and species tree topologies for divergence time estimation
Carruthers, Tom et al. (2022), The implications of incongruence between gene tree and species tree topologies for divergence time estimation, Dryad, Dataset, https://doi.org/10.5061/dryad.zw3r2287m
Phylogenetic analyses are increasingly being performed with datasets that incorporate hundreds of loci. Due to incomplete lineage sorting, hybridization, and horizontal gene transfer, the gene trees for these loci may often have topologies that differ from each other and from the species tree. The effect of these topological incongruences on divergence time estimation has not been fully investigated. Using a series of simulation experiments and empirical analyses, we demonstrate that when topological incongruence between gene trees and the species tree is not accounted for, the temporal duration of branches in regions of the species tree that are affected by incongruence is underestimated, whilst the duration of other branches is considerably overestimated. This effect becomes more pronounced with higher levels of topological incongruence. We show that this pattern results from erroneous estimation of the number of substitutions along branches in the species tree, although the effect is modulated by the assumptions inherent to divergence time estimation, such as those relating to the fossil record or among-branch-substitution-rate variation. By only analysing loci with gene trees that are topologically congruent with the species tree, or only taking into account the branches from each gene tree that are topologically congruent with species tree, we demonstrate that the effects of topological incongruence can be ameliorated. Nonetheless, even when topologically congruent gene trees or topologically congruent branches are selected, error in divergence time estimates remains. This stems from temporal incongruences between divergence times in species trees and divergence times in gene trees, and more importantly, the difficulty of incorporating necessary assumptions for divergence time estimation.