Revisiting the multispecies coalescent model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae)
Data files
Aug 01, 2025 version files 10.04 MB
-
12S.nex
180.88 KB
-
2logBF_P2C2M2.xml
1.53 MB
-
2logBF.xml
1.41 MB
-
A12D.nex
44.86 KB
-
A1D.nex
36.05 KB
-
A4B.nex
24.53 KB
-
A9C.nex
23.09 KB
-
CMOS.nex
32.10 KB
-
CYTB.nex
130.66 KB
-
DMXL.nex
56.01 KB
-
DNAH3.nex
56.42 KB
-
Eulaemus_MPL_h2.nex
127.99 KB
-
EXPH5.nex
56.37 KB
-
Gene_Trees_Net.xml
1.27 MB
-
h2_network_dendroscope
596 B
-
KIF24.nex
56.11 KB
-
MXRA5.nex
38.93 KB
-
ND4.nex
125.39 KB
-
Phylonet_h2_10runs.out
181.07 KB
-
PNN.nex
57.55 KB
-
PRLR.nex
46.51 KB
-
README.md
5.32 KB
-
RLC.xml
1.41 MB
-
SNCAIP.nex
32.25 KB
-
species_2logBF.tree
31.49 KB
-
species_RLC.tree
31.38 KB
-
species_Strict.tree
31.34 KB
-
Strict_KIF24_MLE.xml
80.70 KB
-
Strict_P2C2M2.xml
1.52 MB
-
Strict.xml
1.41 MB
Abstract
Departures from the Multispecies Coalescent (MSC) assumptions could cause artefactual topologies and node heights estimates, and therefore, trees inferred without MSC model fit testing could potentially misrepresent an accurate approximation of the evolutionary history of a group. The current implementation of the MSC model testing for non-genomic level molecular markers cannot process trees estimated from BEAST 2, limiting its application for large datasets of sequence-based markers. Here we recode functions of the R package P2C2M to assess model fit to the MSC and apply this new implementation, which we named P2C2M2, to test the MSC model in a 16-loci dataset of 42 lizard species focused on the Liolaemus wiegmannii group. We found strong evidence of model departures in several loci, possibly due to historical gene flow, which could also be causing an unexpected position of the L. wiegmannii group within the L. montanus section of Eulaemus, when hybridization is not accounted for. The L. anomalus group is inferred as the closest to the L. wiegmannii group when gene flow is incorporated via a Multispecies Network Coalescent model, and a reticulation, suggesting historical gene flow between the L. wiegmannii and L. montanus groups is inferred, which has not been previously reported. We argue that there are at least three sources of discrepancy between the literature and the node ages estimated in our study: the use of strict molecular clocks without statistical justification, misplaced fossil calibrations, and the estimation of coalescent times instead of species divergence times. We encouraged systematists to routinely test the fit of the MSC model when estimating species trees using sequence-based markers, and to follow a phylogenetic network approach when both this test is significant and when historical gene flow is considered one plausible source of the departure from the MSC model.
General information of the Dataset
- Sequence alignments of this Dataset were generated via Sanger sequencing and include originally generated sequences for this study, as well as published data available in GenBank.
- Taxonomic Sampling was focused in the Liolaemus montanus section of the Eulaemus subgenus of Liolaemus (Squamata: Liolaemidae), but members of the L. lineomaculatus section and Liolaemus sensu stricto are also included.
- XML for replicating the BEAST 2 run for Species Trees estimates under different molecular clock model settings are included, as well as their respective resulting tree files after a 500 million MCMC length run. These tree files were annotated via the treeannotator Software distributed with BEAST, with 10 % of burnin and using the Maximum clade credibility tree and Median heights options.
- XML files to estimate trees for run P2C2M2 were manually edited to BEAST 2 to save branch rate information into the gene trees. The XML example for running a Marginal Likelihood Estimation for a given gene tree-molecular clock combination was also manually edited, in order to run the Path Sampling analysis of the Model Selection package of BEAST 2 ( see https://www.beast2.org/path-sampling/ for general instructions in XML edition to run this analysis).
- A control nexus file with the specifications of the maximum pseudolikelihood phylogenetic network analysis of PhyloNet 3.8.2, considering 2 reticulation events in the net, is also provided here. This file also contains the 14 gene/partition trees used for estimate the phylogenetic network. The output of this run includes estimated networks with branch information, their associated logLikelihood, and a phylogenetic Network in Dendroscope format. Runs for infer networks with different numbers of reticulations can be replicated by replacing the argument h of the PhyloNet script (which is 2 in this case).
Data and file overview
DNA alignments in nexus format:
12S.nex
A1D.nex
A4B.nex
A9C.nex
A12D.nex
CMOS.nex
CYTB.nex
DMXL.nex
DNAH3.nex
EXPH5.nex
KIF24.nex
MXRA5.nex
ND4.nex
PNN.nex
PRLR.nex
SNCAIP.nex
XML files for species tree estimation and divergence times:
2logBF.xml
: XML for estimating the species tree under the preferred combination of clock models.Strict.xml
: XML for estimating the species tree under a strict clock for all loci/partitions.RLC.xml
: XML for estimating the species tree under a Random Local Clock (RLC) for those loci in which the strict clock was rejected (based on Bayes factors).
Species trees under alternative clock model settings:
species_2logBF.tree
: Species tree of the Liolaemus montanus section of Eulaemus inferred under the preferred combination of molecular clock models (based on Bayes factors).species_Strict.tree
: Species tree of the Liolaemus montanus section of Eulaemus inferred under a strict molecular clock model for all loci/partitions.species_RLC.tree
: Species tree of the Liolaemus montanus section of Eulaemus inferred under a Random Local Clock (RLC) for those loci in which the strict clock was rejected (based on Bayes factors).
XML files to generate the P2C2M2's input species and gene trees files (see also [https://github.com/arleyc/P2C2M2] for instructions on how to run P2C2M2):
2logBF_P2C2M2.xml
: XML for generating the input species and gene trees files under the preferred combination of clock models.Strict_P2C2M2.xml
: XML for generating the input species and gene trees files under a strict clock for all loci/partitions.
XML example to run a Marginal Likelihood Estimation (MLE) for a particular molecular clock model using the Path Sampling method of the Model Selection package of BEAST 2:
Strict_KIF24_MLE.xml
: In this example, the analysis will estimate the marginal likelihood of the Strict Molecular Clock model for the KIF24 loci.
PhyloNet nexus control files, outputs, and XML to infer gene trees:
Eulaemus_MPL_h2.nex
: This control file have all the specifications and input data (gene trees) to infer the 2 reticulations phylogenetic network in PhyloNet 3.8.2, using a maximum pseudolikelihood (MPL) approach. Replacing the number of reticulations (h) with values between 0 and 6 will generate the rest of the networks inferred in our original study.Phylonet_h2_10runs.out
: This is the resulting output of the complete run specified in the control file described above. The file contains inferred networks considering 2 reticulations, branch information included in the nets, and output in Dendroscope format, and the logLikelihood for each estimated topology.h2_network_dendroscope
: The preferred phylogenetic network (based on logLikelihood values) for the Liolaemus montanus section of Eulaemus.Gene_Trees_Net.xml
: XML for estimating the gene trees used for the species phylogenetic network inference in PhyloNet 3.8.2.
Details on data collection and analysis can be found in the Material and Methods section of the article: Revisiting the problem of Multispecies Coalescent Model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae)
In general, the primary data available here includes DNA sequence data for 16 loci (13 nuclear loci and 3 mitochondrial) generated by Sanger sequencing. Species sampled for this dataset were focused on members of the Liolaemus montanus section of the Eulaemus subgenus of Liolaemus (Squamata: Liolaemidae), including all the known species/lineages of the Liolaemus wiegmannii group. These alignments comprise both published data from GenBank and originally generated sequences for this study.
Several XML files for replicating BEAST 2 runs are also included. XML files for infer input trees to run P2C2M2, and XML for marginal likelihood estimates were manually edited. Instructions on these editions can be found at https://github.com/arleyc/P2C2M2 and https://www.beast2.org/path-sampling/, respectively.
Phylogenetic networks inference were made using a maximum pseudolikelihood approach implemented in PhyloNet 3.8.2. Networks were inferred based on Bayesian gene trees, allowing the number of reticulations (h) from 0 to 6. The nexus control file supplied in this dataset will replicate a 10 runs inference for h=2, which was the number of reticulations chosen based on a loglikelihood criteria. Alternative networks for h values different than 2 can be replicated by replacing this parameter value from the control file provided here.