Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability-based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.
Figure 2a - Results of sim. 1 - subst. rate s1
Complete file set of data set 1 that was simulated under the multispecies coalescent model and under substitution rate s1. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace s1 of Figure 2a.
Fig2a_Rate.s1_Sim.001.tar.gz
Figure 2a - Results of sim. 1 - subst. rate s2
Complete file set of data set 1 that was simulated under the multispecies coalescent model and under substitution rate s2. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace s2 of Figure 2a.
Fig2a_Rate.s2_Sim.001.tar.gz
Figure 2b - Results of sim. 1 - subst. rate s1
Complete file set of data set 1 that was simulated in the presence of migration between species and under substitution rate s1. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace s1 of Figure 2b.
Fig2b_Rate.s1_Sim.001.tar.gz
Figure 2b - Results of sim. 1 - subst. rate s2
Complete file set of data set 1 that was simulated in the presence of migration between species and under substitution rate s2. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace s2 of Figure 2b.
Fig2b_Rate.s2_Sim.001.tar.gz
Simulated Data behind Figure 2a - subst. rate s1
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model and substitution rate s1. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST.
SimulatedData_Fig2a_Rate.s1.tar.gz
Simulated Data behind Figure 2a - subst. rate s2
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model and substitution rate s2. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST.
SimulatedData_Fig2a_Rate.s2.tar.gz
Simulated Data behind Figure 2b - subst. rate s1
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated in the presence of migration between species and under substitution rate s1. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST.
SimulatedData_Fig2b_Rate.s1.tar.gz
Simulated Data behind Figure 2b - subst. rate s2
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated in the presence of migration between species and under substitution rate s2. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST.
SimulatedData_Fig2b_Rate.s2.tar.gz
Figure 3 - Loci 5 - Results of sim. 1
Complete file set of data set 1 that comprises a total of five genes. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace 'N of loci: 5' of Figure 3.
Fig3_Loci.5_Sim.01.tar.gz
Figure 3 - Loci 10 - Results of sim. 1
Complete file set of data set 1 that comprises a total of ten genes. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace 'N of loci: 10' of Figure 3.
Fig3_Loci.10_Sim.01.tar.gz
Figure 3 - Loci 15 - Results of sim. 1
Complete file set of data set 1 that comprises a total of 15 genes. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace 'N of loci: 15' of Figure 3.
Fig3_Loci.15_Sim.01.tar.gz
Figure 3 - Loci 20 - Results of sim. 1
Complete file set of data set 1 that comprises a total of 20 genes. The uncompressed folder constitutes the input for the R package P2C2M to generate the first column of columnspace 'N of loci: 20' of Figure 3.
Fig3_Loci.20_Sim.01.tar.gz
Simulated Data behind Figure 3 - Loci 5
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model using a total of five separate genes. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST. This set of data was used to generate columnspace 'N of loci: 5' of Figure 3.
SimulatedData_Fig3_Loci.5.tar.gz
Simulated Data behind Figure 3 - Loci 10
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model using a total of ten separate genes. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST. This set of data was used to generate columnspace 'N of loci: 10' of Figure 3.
SimulatedData_Fig3_Loci.10.tar.gz
Simulated Data behind Figure 3 - Loci 15
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model using a total of 15 separate genes. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST. This set of data was used to generate columnspace 'N of loci: 15' of Figure 3.
SimulatedData_Fig3_Loci.15.tar.gz
Simulated Data behind Figure 3 - Loci 20
Complete set of nucleotide sequence data alignments simulated on genealogies that were themselves simulated under the multispecies coalescent model using a total of 20 separate genes. The individual files constitute the input for the Python script BEAUTiAutomator.py to generate the XML-formatted input files for *BEAST. This set of data was used to generate columnspace 'N of loci: 20' of Figure 3.
SimulatedData_Fig3_Loci.20.tar.gz