Data from: Whole-genomes illuminate the drivers of gene tree discordance and the tempo of tinamou diversification (Aves: Tinamidae)
Data files
Nov 21, 2025 version files 633.75 MB
-
README.md
5.68 KB
-
Tinamou-phylogenomics2.zip
633.74 MB
Abstract
As an old group that has diversified in South America over millions of years, the tinamous (Palaeognathae: Tinamidae) are of high interest for understanding the evolution of birds and the assembly of the Neotropical biota. However, there are currently no complete species-level phylogenies of this group. Most prior work has been based on either morphological data or a small number of molecular markers, each of which has limited capability for reconstructing the tinamou phylogeny. Therefore, the interrelationships of most tinamou species are uncertain. We analyzed 80 whole genomes from a mix of historical study skins and frozen tissues, including all 46 recognized species of tinamous, to (1) reconstruct their interrelationships, (2) estimate the timeframe of tinamou evolution, and (3) examine the effects of incomplete lineage sorting (ILS) and ancestral introgression on genome evolution. We compared results for coding (BUSCO) and ultraconserved element (UCE) loci, as well as sex-linked and autosomal markers, and used fossil-calibrated tip-dating to estimate divergence times. Tinamous diverged from their sister-group, the extinct Moas, 50-60 mya, and their crown divergence occurred roughly 30-40 mya, followed by constant diversification rates until the present. Phylogenetic reconstructions were largely robust across methods and datasets. Only one clade in the genus Crypturellus displayed substantial species-tree discordance across the different data sets. To investigate the impacts of introgression on this discordance, we quantified introgression for 100kb non-overlapping windows across the genome and identified pervasive genome-wide introgression. The distribution of this introgression across the genome was dependent on the assumed phylogeny applied to the f-branch model. When assuming one of these topologies in the f-branch model, patterns of introgression matched theoretical predictions about genome architecture. Overall, we present the most complete phylogeny for tinamous to date, identify an unrecognized species, and provide a case study for species-level phylogenomic analysis using whole genomes
This repository contains scripts and code for Musher et al.
The main file is Tinamou-phylogenomics2.zip, which contains multiple folders with scripts and data.
Folder details:
ABBA-BABA-windows contains data needed to run and the results of the 100 kb window introgression analysis, including an R script to generate Figure 4B.
- Abba-baba-windows.R = script for plotting introgression statistics
- ABBABABAwindows.w100k.T1.csv = output from introgression analysis assuming Tree "T1"
- ABBABABAwindows.w100k.T2.csv = output from introgression analysis assuming Tree "T2"
- ABBABABAwindows.w100k.T3.csv = output from introgression analysis assuming Tree "T3"
"ASTRAL" contains files with all collapsed gene trees in a single file for each dataset used as input for ASTRAL
- Files ending in .trees are input gene trees for ASTRAL-III
- Files ending in .log are output log files from ASTRAL-III
- Files ending in .tre are output tree results from ASTRAL-III
- File names include information about missing data (75p=75%; 100p=100%), dataset (e.g., UCE100=UCE's with 100 bp of flanking region)
"phylonet" contains input and output files for phylonet, along with breakpoint analysis for determining the optimal m-value
- "breakpoint_analysis.R" = R-script for finding optimal m-value
- "model_likelihoods.xlsx" = list of likelihoods for each output network
- Input files for phylonet analysis include ".nex" extension for autosomal and z-chromosome datasets of different values of m (migration). e.g., uce1000.cladeA.autosomes.m0.nex is the phylonet input for autosomal UCE dataset assuming m=0. and "uce1000.cladeA.chrz.m1.nex" is the input for the z-chromosome UCE dataset assuming m=1.
- Output files from phylonet analysis include a ".log" extension following the same structure as the input.
"RF.distances" contains a script for replicating several statistics and figures in the manuscript, "PIS_rf_dist_collapsed.R", output tables, and alignment/genetree files for each dataset. The R script also contains code for looking at node heights of phylogenetic triplets.
- Sub-directories include gene trees from each of four datasets: BUSCO's (cds prefix), UCE100, UCE300, and UCE1000 (see supplemental materials).
- "PIS_rf_dist_collapsed.R" is the R script used for estimating RF distances and modeling them against alignment attributes like GC content variation or parsimony-informative sites
- "Topologies_Support.csv" contains gene concordance and bootstrap values for each species tree inferred from each dataset
- .csv files contain output from the R-script, including columns for:
- aln: the alignment name
- pars.site: the number of parsimony informative sites
- tot.sites: the alignment length
- pars.site.prop: the proportion of parsimony informative sites
- rf.dist.coll: the distance between the gene tree for each alignment and the inferred species tree of a given dataset.
- gc.cont: variance in gc content for each alignment
"MSCquartets" contains data and a script that will replicate quartet analyses for assessing the effect of ILS
- "MSCquartents.R" = R-script for running MSCquartets
- "autosomes-cladeA-100p.trees" = autosomal gene trees for clade A only sampling
- "chrz-cladeA-100p.trees" = Z-chromosome gene trees for clade A only sampling
"beast" contains input and results from the Beast analysis
- Beuti_2.7_23_uces_6fossils+Moa_final.xml is the input file generated in Beauti.
- The remaining three files are outputs from the beast analysis (.trees=posterior tree set, .log=posterior log-file, and .tre=summarized tree from treeAnnotator)
"Filter_loci_for_beast" contains autsomal UCEs and gene trees for filtering prior to Beast analysis
- "Autosomes-mafft-nexus-clean-trimmed-100p" is a directory containing all alignments and trees for autosomal UCEs used in the filtering exercise
- "Filter_loci.R" is the R script used to filter loci
- "Filtered_fastas" is the list of alignments in fasta format used for beast
- "tinamous_27Aug24.csv" shows the details of the filtering exercise on all UCEs from "Autosomes-mafft-nexus-clean-trimmed-100p", including columns for
- loc: alignment file used
- Tre: tree-file used
- seq.len: alignment length
- var.sites: number of parsimony informative sites
- var.frac: proportion of parsimony informative sites
- lambda: the inferred value of lambda from the branch-length smoothing
- log10lambda: log10 transformed lambda
- LR: the likelihood ratio of the clock to the non-clock model of sequence evolution
- RF: the RF distance between gene and species tree
- p: the p-value of the likelihood ratio test
- df: the degrees of freedom in the likelihood ratio test
- "filtered_loci.csv": is the same table as above, but only including final loci after filtering
"Trees" contains all concatenated and astral phylogenies for each dataset. Those with the prefix "concord" contain gene concordance values at all nodes. Those with the prefix "crypt" show only relationships for clade A.
TESS contains a script, "TESS.R," for replicating the TESS analysis of diversification rates
Table S1 is the supplementary table 1 referenced in the article text, containing a list of samples used in the study and BUSCO statistics for each genotype.
Table S6 is a table of clade ages inferred by Beast2st2
Tinamou_assembly_pipeline.txt is a list of commands used for bioinformatic assembly of tinamou whole genomes
Supplemental_Materials.pdf: This PDF contains supplemental methods, tables, and figures supporting the studies of Musher et al.
