Skip to main content
Dryad

Data from: Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets

Cite this dataset

Fujisawa, Tomochika; Barraclough, Timothy G. (2013). Data from: Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets [Dataset]. Dryad. https://doi.org/10.5061/dryad.0hv88

Abstract

DNA barcoding-type studies assemble single-locus data from large samples of individuals and species, and have provided new kinds of data for evolutionary surveys of diversity. An important goal of many such studies is to delimit evolutionarily significant species units, especially in biodiversity surveys from environmental DNA samples. The Generalized Mixed Yule Coalescent (GMYC) method is a likelihood method for delimiting species by fitting within- and between-species branching models to reconstructed gene trees. Although the method has been widely used, it has not previously been described in detail or evaluated fully against simulations of alternative scenarios of true patterns of population variation and divergence between species. Here, we present important reformulations to the GMYC method as originally specified, and demonstrate its robustness to a range of departures from its simplifying assumptions. The main factor affecting the accuracy of delimitation is the mean population size of species relative to divergence times between them. Other departures from the model assumptions, such as varying population sizes among species, alternative scenarios for speciation and extinction, and population growth or subdivision within species, have relatively smaller effects. Our simulations demonstrate that support measures derived from the likelihood function provide a robust indication of when the model performs well and when it leads to inaccurate delimitations. Finally, the so-called single-threshold version of the method outperforms the multiple-threshold version of the method on simulated data: we argue that this might represent a fundamental limit due to the nature of evidence used to delimit species in this approach. Together with other studies comparing its performance relative to other methods, our findings support the robustness of GMYC as a tool for delimiting species when only single-locus information is available.

Usage notes