Data from: The macroevolutionary dynamics of pharyngognathy in fishes fail to support the key innovation hypothesis
Data files
Sep 09, 2025 version files 75.71 MB
-
Borstein_et_al_2024_DRYAD.zip
75.70 MB
-
README.md
11.22 KB
Sep 18, 2025 version files 75.73 MB
-
Borstein_et_al_2024_DRYAD.zip
75.72 MB
-
README.md
11.52 KB
Abstract
Key innovations, traits that provide species access to novel niches, are thought to be a major generator of biodiversity. One commonly cited example of key innovation is pharyngognathy, a set of modifications to the pharyngeal jaws found in some highly species-rich fish clades such as cichlids and wrasses. Here, using comparative phylogenomics and phylogenetic comparative methods, we investigate the genomic basis of pharyngognathy and the impact of this innovation on diversification. Whole genomes resolve the relationships of fish clades with this innovation and their close relatives, but high levels of topological discordance suggest the innovation may have evolved fewer times than previously thought. Closer examination of the topology of noncoding elements accelerated in clades with the pharyngognathy innovation reveals hidden patterns of shared ancestry across putatively independent transitions to pharyngognathy. When our updated phylogenomic relationships are used alongside large-scale phylogenetic and ecological datasets, we find no evidence pharyngognathy consistently modifies the macroevolutionary landscape of trophic ecology nor does it increase diversification. Our results highlight the necessity of incorporating genomic information in studies of key innovation.
Citation: Borstein SR, Hammer MP, O'Meara BC, and McGee MD. 2024. The macroevolutionary dynamics of pharyngognathy in fishes fail to support the key innovation hypothesis. Nature Communications 15:10325. 10.1038/s41467-024-53141-4
Description of the data and file structure
Directories
DietAnalysis: Data, results, and scripts related to phylogenetic comparative analysis of trophic levels
Files
- BBMV_Data.RData: Phylogeny and formatted data used in BBMV analysis. Used in BBMV_MCM.R and BBMV_Scripts.R
- BBMV_MCMC.R: Script to perorm MCMC BBMV analyses
- BBMV_Scripts.R: Script to run BM, OU, and BBMV model comparisons.
- BBMV_Scripts.RData: Output for models in BBMV_Scripts.R
- BBMV_Summary_Scripts.R: Scripts to calculate 95% CI for BBMV parameter estimates and make figures of macroevolutionary landscapes.
- nonpharyngognath_TL.BBMV.Dat: R data object for BBMV MCMC analysis for non-pharyngognathous fishes.
- pharyngognath_PJA.TL.LOO_Cichlidae.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding the family Cichlidae.
- pharyngognath_PJA.TL.LOO_Embiotocidae.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding the family Embiotocidae.
- pharyngognath_PJA.TL.LOO_Exocoetidae.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding the family Exocoetidae.
- pharyngognath_PJA.TL.LOO_Labridae.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding the family Labridae.
- pharyngognath_PJA.TL.LOO_MalawiVic.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding Malawi and Victorian Cichlids.
- pharyngognath_PJA.TL.LOO_Pomacentridae.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes excluding the family Labridae.
- pharyngognath_TL.BBMV.Dat: R data object for BBMV MCMC analysis for pharyngognathous fishes.
- PJA_Diet_Analyses.R: Scripts for formatting data for BBMV and performing tests of BM, OU, and EV models of evolution for trophic level in mvMorph.
- PJA_Diet_Analyses.RData: R data object of results from mvMorph tests of evolutionary models for trophic level evolution.
- PJA_Tree.tree: Time-calibrated phylogeny used in scripts for analyses of trophic level evolution.
- TrophicLevel_Data.csv: Trophic levels for species from individual studies used in analyses.
- Species: Scientific name of species
- TrophicLevel: Calculated trophic level obtained from dietr
- n: number of individuals used for gut content analysis
- PredState: lifehistory stage of the specimens used in gut content analysis. Either adults or a mix of juvenile and adults (juv./adults).
- FullCitation: Full citation for source of the data.
- TrophicLevel_Data_Species_Averages.csv: Average trophic levels for species used in analyses.
- Species: Scientific name of species
- TrophicLevel: Mean trophic level across studies
- SE: Standard error around the mean trophic level calculation. When species have only as single observation, this value was set to the global standard error for the dataset.
- nObs: The number of dietary records used to calculate the mean trophic level per species.
Diversification: Data, results, and scripts related to lineage dieversification analyses
Files
- fissPJA.R: Scripts used to perform FiSSE diversification analyses.
- FisseResults.RData: R data object containing results of analyses performed in fissePJA.R.
- HiSSE_Models.zip: R saves of HiSSE models. model numbers correspond to those named in HiSSE_Scripts.R
- HiSSE_Scripts.R: Scripts to run and process HiSSE models.
- Oval_HiSSE_Scripts.R: Scripts to run and process HiSSE models for the clade Ovalentaria.
- Ovalentaria_HiSSE_Models.zip: R saves of HiSSE models for Ovalentaria. model numbers correspond to those named in Oval_HiSSE_Scripts.R
- PJA_Presence.csv: Binary presence (1) or absence (0) for each species in the phylogeny PJA_Tree.tree used for analyses of lineage diversification.
- taxon: Species name
- state: Binary presence (1) or absence (0) of pharyngognathy for each species in the phylogeny PJA_Tree.tree used for analyses of lineage diversification.
- PJA_Tree.tree: Time-calibrated phylogeny used in scripts for analyses of lineage diversification.
Genomic_PhylogenomicAnalyses: Scripts and data used in genomic assembly and analysis of hemiplasy
Files
- Alfaro2018Tree_Rooted.tre: ASTRAL derived coalescent tree from Alfaro et. al. (2018) Nat. Ecol. Evol. 2:688-696. in Newick format used in hemiplasy risk factor analysis (Fig. 4 & Fig. S5).
- AlfaroTips.csv: CSV file with columns for the tip/species, family, and pharyngognath status used to subset the Alfaro2018Tree_Rooted.tre for hemiplasy risk factor analysis (Fig. 4 & Fig. S5).
- Species: Species name
- Family: Family assignment
- PJ: Binary presence (1) or absence (0) of pharyngognathy for each species in the phylogeny.
- astral_eytan3b_rooted.tre: ASTRAL derived coalescent tree from Anchored Hybrid Enrichment sequence data in Eytan et. al. (2015) BMC Evol Biol 15:113 in Newick format used in hemiplasy risk factor analysis (Fig. 4 & Fig. S3).
- ASTRAL_Phararyngognath_32Tip_rooted.tre: ASTRAL derived coalescent tree from whole genomes of 32 species in Newick format used in hemiplasy risk factor analysis (Fig. 1A-B).
- ASTRAL_Phararyngognath_64Tip__Rooted.tre: ASTRAL derived coalescent tree from whole genomes of 64 species in Newick format used in hemiplasy risk factor analysis (Fig. S2).
- astral_thin100k_rooted.tree: ASTRAL derived coalescent tree derived loci of at least 300 base pairs in length in our 32 taxa in which we randomly removed loci within 100kb or less of each other. Supplied in Newick format used in hemiplasy risk factor analysis (Fig. S7).
- Fig2SIMMAP.Rmd: R Markdown file with code to generate simmaps and perform BRMS regression shown in Figure 2.
- Fig2SIMMAP.Robj: R object containing data used in scripts in Fig2SIMMAP.Rmd.
- full_coalescent_tree_calculated_from_sCF_Rooted.tre: Coalescent tree in Newick format derived from IQ-Tree2 site-concordance factors used in hemiplasy risk factor analysis (Fig. 1C).
- Ghezelayagh2022_Tree_Rooted.tre: ASTRAL derived coalescent tree from Ghezelayagh et. al. (2022) 6:1211-1220. in Newick format used in hemiplasy risk factor analysis (Fig. 4 & Fig. S6).
- GhezelayaghTips.csv: CSV file with columns for the tip/species, family, and pharyngognath status used to subset the Ghezelayagh2022_Tree_Rooted.tre for hemiplasy risk factor analysis (Fig. 4 & Fig. S6).
- Species: Species name
- Family: Family assignment
- PJ: Binary presence (1) or absence (0) of pharyngognathy for each species in the phylogeny.
- HPC_script_examples.Rmd: R Markdown file containing bash scripts for genomic assemblies, alignment, phylogenetic breakpoints, coalescent tree construction, and analysis of gene tree-species tree discordance.
- Hughes2018_RootedNames.tre: ASTRAL derived coalescent tree from Hughes et. al. (2018) Proc. Natl. Acad. Sci., 115:6249-6254. in Newick format used in hemiplasy risk factor analysis (Fig. 4 & Fig. S4).
- HughesTips.csv: CSV file with columns for the tip/species, family, and pharyngognath status used to subset the Hughes2018_RootedNames.tre for hemiplasy risk factor analysis (Fig. 4 & Fig. S4).
- Species: Species name
- Family: Family assignment
- PJ: Binary presence (1) or absence (0) of pharyngognathy for each species in the phylogeny.
- HPC_script_examples.Rmd: R Markdown file containing bash scripts for genomic assemblies, alignment, phylogenetic breakpoints, coalescent tree construction, and analysis of gene tree-species tree discordance.
- s0.0005_results.csv: Output from HeIST using a mutation rate of 0.0005.
- Tree: Tree ID for trees with correct evolutionary history from HeIST simulations
- NumberMutations: The number of mutations on a given tree
- s0.005_results.csv: Output from HeIST using a mutation rate of 0.005.
- Tree: Tree ID for trees with correct evolutionary history from HeIST simulations
- NumberMutations: The number of mutations on a given tree
- s0.05_results.csv: Output from HeIST using a mutation rate of 0.005.
- Tree: Tree ID for trees with correct evolutionary history from HeIST simulations
- NumberMutations: The number of mutations on a given tree
- sCF_GTR_F0_R8_Rooted.tre: IQ-Tree2 phylogeny derived from site-concordance factors with branch lengths estimated under GTR+F0+R8 substitution model in Newick format (Fig. S1).
Megaphylogeny: Data used in constructing the megaphylogeny
Files
- Calibrations.docx: Table and references for calibrations used for time-calibration.
- PJA_Accessions_MegaPhylo.csv: Accession table for BOLD and NCBI accessions used to generate the phylogeny PJA_Tree.tree used in analyses.
- Taxon: Species name
- 12S RNA: Accessions for 12S rRNA
- 16S RNA: Accessions for 12S rRNA
- ATP6: Accessions for mitochondrially encoded ATP synthase membrane subunit 6
- ATP8: Accessions for 'mitochondrially encoded ATP synthase membrane subunit 8
- CYTB: Accessions for Cytochrome b
- UBE3A: Accessions for ubiquitin protein ligase E3A
- COI: Accessions for Cytochrome oxidase subunit I
- COII: Accessions for Cytochrome oxidase subunit II
- COIII: Accessions for Cytochrome oxidase subunit III
- D-loop: Accessions for the mitochondria D-loop/control region
- ENC1: Accessions for Ectoderm-Neural Cortex 1
- FICD: Accessions for FIC domain protein adenylyltransferase
- GCS1: Accessions for GCS1
- GLYT: Accessions for glycine transporter 1
- GPR85: Accessions for G Protein-Coupled Receptor 85
- H3: Accessions for histone 3A
- KIAA: Accessions for KIAA
- MYH6: Accessions formyosin heavy chain 6
- ND1: Accessions for NADH dehydrogenase 1
- ND2: Accessions for NADH dehydrogenase 2
- ND3: Accessions for NADH dehydrogenase 3
- ND4: Accessions for NADH dehydrogenase 4
- ND4l: Accessions for NADH dehydrogenase 4L
- ND5: Accessions for NADH dehydrogenase 5
- ND6: Accessions for NADH dehydrogenase 6
- PANX2: Accessions for Pannexin 2
- PLAGL2: Accessions for Zinc finger protein PLAGL2
- PTR: Accessions for patched domain-containing 4
- RAG1: Accessions for Recombination activating gene 1
- RAG2: Accessions for Recombination activating gene 2
- RIPK4: Accessions for Receptor Interacting Serine/Threonine Kinase 4
- RNF213: Accessions for Ring finger protein 213
- SIDKEY: Accessions for sidkey
- SLC10A3: Accessions for Solute Carrier Family 10 Member 3
- SNX33: Accessions for Sorting Nexin 33
- SVEP1: Accessions for Sushi, Von Willebrand Factor Type A, EGF And Pentraxin Domain Containing 1
- TBR1: Accessions for T-box brain 1
- TMO-4C4: Accessions for TMO-4C4
- VCPIP: Accessions for Valosin Containing Protein Interacting Protein
- ZIC1: Accessions for zinc finger protein 1
- ZNF503: Accessions for Zinc Finger Protein 503
- PJA_MitogenomeAccessions_MegaPhylo.csv: Mitogenome accession numbers used to generate the phylogeny PJA_Tree.tree used in analyses.
- Species: Species name
- Accession: Accession for mitogenome used in generating megaphylogeny
Change Log:
9/18/25: Ghezelayagh2022_Tree_Rooted.tre while described in the README was accidentally omitted and missing from the submitted zip archive while all remaining files/scripts to analyze that dataset were included. This has been resolved and the tree has now added to the archive.
