Skip to main content

Data from: Is recombination a problem for species-tree analyses?

Cite this dataset

Lanier, Hayley C.; Knowles, L. Lacey (2011). Data from: Is recombination a problem for species-tree analyses? [Dataset]. Dryad.


As the field of phylogenetics transitions into phylogenomics it has spurred a shift in the general paradigm for data analysis whereby specific dataset attributes can now be considered in a model-based framework. Much effort has been put in to modeling the effects of nucleotide substitution within a genealogy (i.e., modeling mutations; Felsenstein 2005) and the sorting genes between lineages (i.e., coalescence; Knowles and Kubatko 2010), both of which are inherent properties of all species histories. Improvements in accuracy of species-tree estimation related to using models that account for these two sources of uncertainty have been well documented (Carstens and Knowles 2007; Edwards et al. 2007; Kubatko and Degnan 2007), but these are clearly not the only sources of gene tree species-tree discordance. As the field transitions towards using multilocus data, the role of other inherent dataset properties that may be sources of uncertainty, such as intralocus recombination needs to be examined and quantified. As all species-tree methods in current use rely on the simplifying assumption that recombination occurs between but not within loci, ignoring the presence of recombination represents a widespread and ubiquitous violation of species-tree models. Recombination within a locus may be a greater problem for species-tree methodologies than for concatenation because species-tree methods more accurately model the patterns arising from coalescent stochasticity. Concatenation assumes all genes share a common underlying tree, a model violated by most multilocus data (Knowles and Carstens 2007; Cranston 2010; Linnen 2010; Wiens et al. 2010). Relative to the presence of such a gross model violation, errors introduced by within-locus recombination are likely to be minor for datasets analyzed by concatenation. Coalescent methods for species-tree estimation explicitly model each gene tree separately, making the relative contribution of recombination to the uncertainty of the estimated species tree greater. Challenges posed by recombination may also be greatest for species-tree methods that rely on estimating parameters related to the scaled mutation rate (Liu 2008; Liu et al. 2008; Heled and Drummond 2010) because recombination-introduced heterogeneity may interfere with branch length estimates. Lastly, recombination may interact with other aspects of the speciation history, such as time to divergence, population size, and the length of time between speciation events, and sampling effort (McCormack et al. 2009). For example, recombination events occurring in recently speciated groups may not have accumulated a sufficient number of mutations for the effects to be problematic (see Fig. 1).

Usage notes