Skip to main content

Data from: The spectre of too many species

Cite this dataset

Leache, Adam D.; Zhu, Tianqi; Rannala, Bruce; Yang, Ziheng (2018). Data from: The spectre of too many species [Dataset]. Dryad.


Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the BPP program have suggested that BPP may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the protracted speciation model is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in BPP tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in BPP provide much more reliable inference under the gdi than the approximate method PHRAPL. We distinguish between Bayesian model selection and parameter estimation, and suggest that the model selection approach is useful for identifying sympatric cryptic species while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.

Usage notes


National Science Foundation, Award: DEB-1456098