Data from: Comparison of methods for molecular species delimitation across a range of speciation scenarios
Data files
Feb 13, 2018 version files 299.78 MB
-
supplementary appendix 1.tar.gz
294.76 MB
-
supplementary appendix 10.pdf
764.83 KB
-
supplementary appendix 11.xlsx
41.56 KB
-
supplementary appendix 12.xlsx
40.31 KB
-
supplementary appendix 13.xlsx
43.35 KB
-
supplementary appendix 14.xlsx
45.45 KB
-
supplementary appendix 15.xlsx
44.78 KB
-
supplementary appendix 16.xlsx
43.08 KB
-
supplementary appendix 2.pdf
2.14 MB
-
supplementary appendix 3.pdf
151.21 KB
-
supplementary appendix 4.tar.gz
113.84 KB
-
supplementary appendix 5.tar.gz
57.86 KB
-
supplementary appendix 6.tar.gz
9.65 KB
-
supplementary appendix 7.pdf
754.69 KB
-
supplementary appendix 8.xlsx
48.01 KB
-
supplementary apppendix 9.pdf
723.86 KB
Abstract
Species are fundamental units in biological research and can be defined on the basis of various operational criteria. There has been growing use of molecular approaches for species delimitation. Among the most widely used methods, the generalized mixed Yule-coalescent (GMYC) and Poisson tree processes (PTP) were designed for the analysis of single-locus data but are often applied to concatenations of multilocus data. In contrast, the Bayesian multispecies coalescent approach in the software BPP explicitly models the evolution of multilocus data. In this study, we compare the performance of GMYC, PTP, and BPP using synthetic data generated by simulation under various speciation scenarios. We show that in the absence of gene flow, the main factor influencing the performance of these methods is the ratio of population size to divergence time, while number of loci and sample size per species have smaller effects. Given appropriate priors and correct guide trees, BPP shows lower rates of species overestimation and underestimation, and is generally robust to various potential confounding factors except high levels of gene flow. The single-threshold GMYC and the best strategy that we identified in PTP generally perform well for scenarios involving more than a single putative species when gene flow is absent, but PTP outperforms GMYC when fewer species are involved. Both methods are more sensitive than BPP to the effects of gene flow and potential confounding factors. Case studies of bears and bees further validate some of the findings from our simulation study, and reveal the importance of using an informed starting point for molecular species delimitation. Our results highlight the key factors affecting the performance of molecular species delimitation, with potential benefits for using these methods within an integrative taxonomic framework.