Species are fundamental units in biological research and can be defined on the basis of various operational criteria. There has been growing use of molecular approaches for species delimitation. Among the most widely used methods, the generalized mixed Yule-coalescent (GMYC) and Poisson tree processes (PTP) were designed for the analysis of single-locus data but are often applied to concatenations of multilocus data. In contrast, the Bayesian multispecies coalescent approach in the software BPP explicitly models the evolution of multilocus data. In this study, we compare the performance of GMYC, PTP, and BPP using synthetic data generated by simulation under various speciation scenarios. We show that in the absence of gene flow, the main factor influencing the performance of these methods is the ratio of population size to divergence time, while number of loci and sample size per species have smaller effects. Given appropriate priors and correct guide trees, BPP shows lower rates of species overestimation and underestimation, and is generally robust to various potential confounding factors except high levels of gene flow. The single-threshold GMYC and the best strategy that we identified in PTP generally perform well for scenarios involving more than a single putative species when gene flow is absent, but PTP outperforms GMYC when fewer species are involved. Both methods are more sensitive than BPP to the effects of gene flow and potential confounding factors. Case studies of bears and bees further validate some of the findings from our simulation study, and reveal the importance of using an informed starting point for molecular species delimitation. Our results highlight the key factors affecting the performance of molecular species delimitation, with potential benefits for using these methods within an integrative taxonomic framework.
SUPPLEMENTARY APPENDIX 1
Original data files generated by simulation.
supplementary appendix 1.tar.gz
SUPPLEMENTARY APPENDIX 2
Detailed methods and results of variations of Scenario II.
supplementary appendix 2.pdf
SUPPLEMENTARY APPENDIX 3
Chronograms of the five species in Scenario IV, with speciation rate r as (a) 10e-5, (b) 10e-6, (c) 10e-7, and (d) 10e-8 speciation events per year. The timescale for each tree is indicated by the scale axis in time units of one million years before the present.
supplementary appendix 3.pdf
SUPPLEMENTARY APPENDIX 4
Data files from the case study of bears.
supplementary appendix 4.tar.gz
SUPPLEMENTARY APPENDIX 5
Data files from the case study of Apidae bees.
supplementary appendix 5.tar.gz
SUPPLEMENTARY APPENDIX 6
Data files from the case study of Osmia bees.
supplementary appendix 6.tar.gz
SUPPLEMENTARY APPENDIX 7
Results of analyses of data sets from Scenario I. (a) Species delimitations estimated by the Bayesian coalescent method in BPP. Boxplots are shown for posterior probabilities of the one-species delimitation model (P1), across every 10 replicates under each set of conditions. Fifteen combinations of sample size n and population size N are shown along the top. The x-axis represents the number of loci. Probabilities are given on the y-axis. (b) Species delimitation estimated by the generalized mixed Yule-coalescent (GMYC) and the Poisson tree processes (PTP) methods. Stacked rows are shown for GMYC, Bayesian PTP heuristic (bPTP-h), Bayesian PTP maximum likelihood (bPTP-ML), and PTP heuristic (PTP-h), respectively. Twelve combinations of sample size n and population size N are along the top of the panels. The x-axis represents the number of loci. The y-axis represents the number of cases classified by false positive (dark blue) and not available (grey) among every 10 replicates under each set of conditions.
supplementary appendix 7.pdf
SUPPLEMENTARY APPENDIX 8
Results of supplementary data sets to Scenario II.
supplementary appendix 8.xlsx
SUPPLEMENTARY APPENDIX 9
Species delimitations estimated by PTP-h for data sets from Scenario II. Panels show nine combinations of population size N and divergence time t along the top, and four values of sample size n on the right. The x-axis represents the number of loci. The y-axis represents the number of cases classified by correct delimitation (dark green), false positive (dark blue), complex false positive (light blue), and not available (grey) among the 10 replicates under each set of conditions.
supplementary apppendix 9.pdf
SUPPLEMENTARY APPENDIX 10
Species delimitations estimated by bPTP-h for data sets from Scenario II. Panels show nine combinations of population size N and divergence time t along the top, and four values of sample size n on the right. The x-axis represents the number of loci. The y-axis represents the number of cases classified by correct delimitation (dark green), false positive (dark blue), complex false positive (light blue), and not available (grey) among the 10 replicates under each set of conditions.
supplementary appendix 10.pdf
SUPPLEMENTARY APPENDIX 11
Results of analyses of data sets from Scenario III.
supplementary appendix 11.xlsx
SUPPLEMENTARY APPENDIX 12
Results of analyses of data sets from Scenario IV.
supplementary appendix 12.xlsx
SUPPLEMENTARY APPENDIX 13
Results of analyses of full data sets from Scenario V.
supplementary appendix 13.xlsx
SUPPLEMENTARY APPENDIX 14
Results of analyses of separate data sets from Scenario V.
supplementary appendix 14.xlsx
SUPPLEMENTARY APPENDIX 15
Inferred species delimitations for Apidae bees.
supplementary appendix 15.xlsx
SUPPLEMENTARY APPENDIX 16
Inferred species delimitations for Osmia bees.
supplementary appendix 16.xlsx