Data from: Impact of model violations on the inference of species boundaries under the multispecies coalescent
Data files
Sep 01, 2017 version files 105.28 KB
-
Comus_BPP.py
4.06 KB
-
Comus_stacey.py
3.14 KB
-
IbdSettings_file.txt
1.74 KB
-
msSimulations_BPP.py
3.95 KB
-
msSimulations_stacey.py
2.84 KB
-
posteriorprediction_BPP.py
8.15 KB
-
PPDTestStatistics.py
16.01 KB
-
TableS1.pdf
24.32 KB
-
TableS2.pdf
41.09 KB
Abstract
The use of genetic data for identifying species-level lineages across the tree of life has received increasing attention in the field of systematics over the past decade. The multispecies coalescent model provides a framework for understanding the process of lineage divergence, and has become widely adopted for delimiting species. However, because these studies lack an explicit assessment of model fit, in many cases, the accuracy of the inferred species boundaries are unknown. This is concerning given the large amount of empirical data and theory that highlight the complexity of the speciation process. Here, we seek to fill this gap by using simulation to characterize the sensitivity of inference under the multispecies coalescent to several violations of model assumptions thought to be common in empirical data. We also assess the fit of the multispecies coalescent model to empirical data in the context of species delimitation. Our results show substantial variation in model fit across datasets. Posterior predictive tests find the poorest model performance in datasets that were hypothesized to be impacted by model violations. We also show that while the inferences assuming the multispecies coalescent are robust to minor model violations, such inferences can be biased under some biologically plausible scenarios. Taken together, these results suggest that researchers can identify individual datasets in which species delimitation under the multispecies coalescent is likely to be problematic, thereby highlighting the cases where additional lines of evidence to identify species boundaries are particularly important to collect. Our study supports a growing body of work highlighting the importance of model checking in phylogenetics, and the usefulness of tailoring tests of model fit to assess the reliability of particular inferences.