Data from: Concatenation fails to describe the anomalous radiation of giant cockroaches (Blattodea: Blaberidae) despite moderate to low discordance
Data files
Jul 25, 2025 version files 1.94 MB
-
Alignments.zip
653.28 KB
-
allLnlsCombined.11222023.csv
281.81 KB
-
bestLNLresults.asof_11202023.csv
19.93 KB
-
codonModels.csv
2.65 KB
-
GeneTrees.zip
825.97 KB
-
GTDiscordances.xlsx
20.32 KB
-
locusInformation.xlsx
22.64 KB
-
README.md
4.20 KB
-
SelACSoftwareInput.zip
79.54 KB
-
SpeciesTrees.zip
32.32 KB
Abstract
Patterns of discordance between gene trees and the species trees they reside in are crucial to the coalescent vs. concatenation debate and may be key to resolving rapid radiations. However, errors in estimating gene tree topologies complicate the issue. Gene trees may appear erroneously discordant with the species tree when they have errors. In this study, we evaluate the prevalence of discordance between gene trees and their species tree using an empirical dataset for a clade with a rapid radiation (Blaberidae). One key advance of our study is the use of complex, computationally intensive, selection-based codon models (FMutSel0 and SelAC) to identify the maximum likelihood gene tree. Our main hypothesis predicted that, if there are two competing topologies for a particular gene tree, the one that is less discordant with the species tree will have less error. Our experimental framework failed to show evidence for this, but only when discordance was measured in reference to a concatenation topology. In follow-up tests we see that the best candidate gene set yielded a coalescent species tree that was less discordant with gene trees. We conclude from these tests that, although discordance is generally rare, it still must be accounted for in order to achieve a biologically realistic outcome. The results suggest a few key improvements to the Blaberidae phylogeny, including identification of an anomaly zone that potentially spans eight or more backbone nodes. These results allow us to support other relationships among blaberid cockroaches that were previously in flux as they now demonstrate molecular and morphological congruence.
https://doi.org/10.5061/dryad.0cfxpnw8n
Description of the data and file structure
Supplement - BlaberidaeSelac.vFeb22.2024 #Supplementary material (methods, results, and discussion)
allLnlsCombined.11222023.csv
#lnL and run information for each individual gene tree optimization in SelAC R package
bestLNLresults.asof_11202023.csv
#best lnL information for each locus optimized in the SelAC R package
codonModels.csv
#list of model combinations chosen by IQTREE’s partitionfinder when specifying ECMS05
locusInformation.xlsx
#biological information for all genomic loci in our analysis; Column labelled “Genome?” notes if a locus is part of the nuclear (NU) or mitochondrial (MT) genome. “+” indicates which loci are confirmed to be protein coding. Remaining information are UniProt protein information for each locus, including the likely name of the protein and gene, and the hypothesized molecular functions.
GTDiscordances.xlsx
#Discordances between estimated gene trees and the concatenation topology. Pairwise distances between the trees (e.g., concat-GTR, concat-codon) are given, Robinson-Foulds (RF) and Path distances are the basis for the comparisons.
Alignments.zip/
|- Alignments/
| |- 40AlignmentSet/ #fasta fomatted individual gene alignments for the 40 loci used in the main experiment
| |- AllAlignments/ #fasta fomatted individual gene alignments for the all loci used in tests
GeneTrees.zip/
|- GeneTrees/
| |- 40GeneTreeSet/ #multi-tree files for Est.GT.1 topologies and Est.GT.2 topologies
| |- AllGeneTrees/ #individual gene tree hypotheses for all 100 genes across 4 methods ( concatenation, GTRG model estimated, GTRG median model estimated, ECM optimized model estimated )
| |- 26untestedFMUTSELGeneTrees.tre #gene trees for 26 loci chosen by FMutSel0, but not used in the main experiment
| |- 27untestedSELACGeneTrees.tre #gene trees for 27 loci chosen by SelAC, but not used in the main experiment
| |- 40FMutSel0_trees.trees #gene trees chosen by FMutSel0 from main experiment
| |- 40SelacGTS.trees #gene trees chosen by SelAC from main experiment
| |- 60untestedCODONGeneTrees.tre #gene trees estimated from the optimized empirical codon model, but not used in the main experiment
| |- 60untestedGTRGeneTrees.tre #gene trees estimated from GTRG, but not used in the main experiment
| |- congruenceTree_trees.trees #40 gene tree from the main experiment that exhibited the least discordance with the concatenation topology
| |- IncongruenceTree_trees.trees #40 gene tree from the main experiment that exhibited the most discordance with the concatenation topology
SelACSoftwareInput.zip/
|- SelACSoftwareInput #Example R scripts to initiate choice of gene trees in the SelAC software package. One example is given for each locus in the main experiment.
SpeciesTrees.zip/
Species trees/
| |- raw #7 species trees used in the final suite of tests. Each one has tip names matching the organism, not the original sample names. Tree 1 is the concatenation topology with UF bootstrap values, Trees 2-5 were estimated species trees using topologies chosen by different models, Tree 6 and 7 are control trees. Trees 2-7 were inferred using ASTRAL-III and have local posterior probabilities.
| |- treesWithModifiedSupportValues #species tree topologies from “raw” folder with node support values added. All “NullValidation” files are species tree topologies with node support values representing the frequency of splits occuring in 120 gene trees. The “concord.gcf” file is the SelAC species tree with gene concordance factors from 40 gene trees. “AnomalyFinder” files are species tree topologies with node annotations from AnomalyFinder.
Sharing/Access information
Data was derived from the following sources: