Data from: The many origins of extremophile fishes
Data files
Apr 07, 2025 version files 4.29 GB
-
Anomaly_Zone.zip
5.15 KB
-
ASTRAL.zip
1.02 GB
-
BAMM.zip
27.42 MB
-
biogeobears.zip
4.17 MB
-
bioinformatics.zip
26.27 KB
-
Body_shape_PCA_and_Disparity_Through_Time.zip
362.69 KB
-
iqtree_concatenated.zip
1.61 MB
-
iqtree_concordance_factors_3.zip
49.95 KB
-
iqtree_partitioned.zip
2.01 GB
-
R_code.zip
20.88 KB
-
README.md
7.83 KB
-
Sanger.zip
27.22 KB
-
tess_analysis.zip
166.66 KB
-
trait_plotting.zip
85.93 KB
-
Zoarcoid_posterior_Sum.tree
152.96 KB
-
Zoarcoid_posterior_Sum.trees
1.22 GB
Abstract
Extremophiles survive in environments that are considered uninhabitable for most living things. The evolution of extremophiles is of great interest because of how they may have contributed to the assembly of ecosystems, yet the evolutionary dynamics that drive extremophile evolution remain obscure. Here, we investigate the evolution of extremophiles in Zoarcoidea, a lineage of over 300 species of fishes that have colonized both poles, the deep sea, and hydrothermal vents. We show that a pulse of habitat invasion occurred across 23 different zoarcoid lineages within the last eight million years, far after the origin of their prototypical innovation for surviving in cold water: antifreeze protein III. Instead, a secondary burst of anatomical, physiological, and life history traits and a handful of founder-events in extreme ecosystems appear to have propelled zoarcoid diversification. These results decentralize the role of prototypical changes to organismal biology in shaping extremophile radiations and provide a clear example of how a combination of ancient adaptations and recent contingency shape the origination of lineages in challenging habitats.
https://doi.org/10.5061/dryad.9kd51c5tt
Description of the data and file structure
Ultraconserved element data were collected from various sources, including previous studies (Ghezelayagh et al., 2022, Nature Eco Evo), mined Genbank genome assemblies, and by UCE assembly from newly extracted tissues. Trait data were collected from the literature.
Files and variables
File: R_code.zip
Description: R code for all analyses conducted in the paper, including BioGeoBears, phytools, and geiger analyses.
File: bioinformatics.zip
Description: txt and .xlsx files with contig % completeness for each individual.
- contig_list.txt #contigs used
- contig_summary #completeness of each contig, including how many UCEs were sequenced
File: trait_plotting.zip
Description: Data for trait-based analyses and ancestral state reconstruction.
- Antifreeze.csv #antifreeze type III protein presence/absence
- Caudal_fin.csv #non-continuous caudal fin presence/absence
- Fangs.csv #fangs presence/absence
- Jellyfish_association.csv #presence/absence of cnidarian associating juveniles
- Molariform.csv #molariform teeth presence/absence
- Predatory_Burrowing.csv #presence/absence of complex predatory burrowing behavior
- Sexual_Dimorphism_Heterodonty.csv #sexually dimorphic heterodont dentition presence/absence
- Air.csv #air breathing presence/absence
- Viviparity.csv #viviparity presence/absence
- ages #output csvs with ages, in millions of years before present, corresponding to origins of each trait for which ancestral state reconstruction was conducted All values in these csvs are clade ages in millions of years.
File: Sanger.zip
Description: Sanger-sequenced loci matrix from Genbank (with Genbank numbers) and output from IQ-TREE analysis.
- Zoarcoidei_sanger_ambiguous_pruned.fasta #input fasta file with Genbank sequences
- 240910231656 #directory containing output from IQTREE maximum likelihood analysis of genbank fasta
- -Zoarcoidei_sanger_ambiguous_pruned.fasta.contree #consensus tree file
- -Zoarcoidei_sanger_ambiguous_pruned.fasta.iqtree #iqtree file
- -Zoarcoidei_sanger_ambiguous_pruned.fasta.log #log file
- -Zoarcoidei_sanger_ambiguous_pruned.fasta.treefile #maximum likelihood tree file
File: tess_analysis.zip
Description: Input and output from Tess-COMET analysis. These are all standard input and output and so we refer the reader to the manuscript for any additional information, as well as to: https://github.com/hoehna/TESS
- ExtinctionRateChanges.txt #extinction rate changes textfile
- ExtinctionRates.txt #extinction rate textfile
- input_tree.tre #input tree file
- MassExtinctionTimes.txt #mass extinction times files
- Rplot.pdf #Output plot
- Rplot01.pdf #Output plot 2
- samples_numCategories.txt #generated sample category file
- SpeciationRateChanges.txt #speciation rate changes file
- SpeciationRates.txt #Speciation rates
- SurvivalProbabilities.txt #Survival probability input
File: Body_shape_PCA_and_Disparity_Through_Time.zip
Description: Input measurements and output PCA and node height test results from the analyses of body shape dimensions.
- CTraits.csv #body shape traits input csv, with measurements of each trait in mm
- dtt_plots #output disparity through time plots for each trait
- LN_TRAITS.csv #ln transformed body shape measurements
- PCA_loadings.csv #output pca loadings
File: iqtree_concatenated.zip
Description: IQ-TREE output from the analysis where the UCEs were concatenated and treated as one partition.
- Zoarcoid_concat.bionj #output bionj file
- Zoarcoid_concat.ckp.gz #output ckp file
- Zoarcoid_concat.contree #output contree file
- Zoarcoid_concat.iqtree #output iqtree file
- Zoarcoid_concat.log #output log file
- Zoarcoid_concat.mldist #output model file 1
- Zoarcoid_concat.model.gz #output model file 2
- Zoarcoid_concat.splits.nex #output splits file
- Zoarcoid_concat.treefile #output tree file
File: iqtree_concordance_factors_3.zip
Description: IQ-TREE output from the concordance factor analysis.
- concordance_factors_Jan1.cf.branch #branch output file
- concordance_factors_Jan1.cf.stat #output files containing all stats, including concordance factors
- concordance_factors_Jan1.cf.tree #tree file with concordance factors on branches annotated
- concordance_factors_Jan1.cf.tree.nex #nexus output file
- concordance_factors_Jan1.log #log file
File: biogeobears.zip
Description: BioGeoBears input and output files.
- geography.txt #input geography file with binary codings
- Habitat_Biogeography.csv #csv file containing areas for each species
- Habitat_Biogeography.xlsx #xlsx file containing areas for each species
- model_output #output for models from BioGeoBears
- plots #output BioGeoBears plots with historical biogeography reconstructions for each model (pie chart and squares)
- stochastic_mapping #output files from stochastic mapping analysis, including output biogeographic events tables (dispersals, vicariance, founder event speciation, all per million years). For information on this method, see: http://phylo.wikidot.com/printer–friendly//biogeographical-stochastic-mapping-example-script
- support #model likelihood test output tables and R data
- zoarcoid.newick #input newick tree
File: BAMM.zip
Description: BAMM input and output files, including plots.
- chain_swap.txt #chain swap output file
- event_data.txt #event data output file
- mcmc_out.txt #mcmc output file
- myPriors.txt #prior input file
- Rplot.pdf #prior v posterior number of shifts
- Rplot01.pdf #phylorate plot
- Rplot02.pdf #marginal shift phylorate plots
- Rplot03.pdf #diversification rate curve
- run_info.txt #run info file
- template_diversification.txt #input diversification file
- Ultra_Min.tree #input tree
File: ASTRAL.zip
Description: ASTRAL directory and output tree file.
- test_data #input for ASTRAL
- iqtree_gene ##other output for IQTREE gene tree analysis
- Anomaly_TREE_ASTRAL.tree ##output ASTRAL-III tree file
File: iqtree_partitioned.zip
Description: IQ-TREE output from the analysis where the UCEs were concatenated and treated as multiple partitions.
- Zoarcoid_partitioned.treefile #output tree file
- Zoarcoid_partitioned.splits.nex #output splits file
- Zoarcoid_partitioned.model.gz #output model fit file
- Zoarcoid_partitioned.mldist #output mldist file
- Zoarcoid_partitioned.log #output log file
- Zoarcoid_partitioned.iqtree #output iq tree file
- Zoarcoid_partitioned.contree #output consensus tree file
- Zoarcoid_partitioned.ckp.gz #output ckp file
- Zoarcoid_partitioned.bionj #output bionj file
- Zoarcoid_partitioned.best_scheme.nex #output best partition scheme file
- Zoarcoid_partitioned.best_scheme #output best partition scheme file2
- Zoarcoid_partitioned.best_model.nex best model scheme file
File: Zoarcoid_posterior_Sum.tree
Description: Time-calibrated phylogeny with median node heights.
File: Zoarcoid_posterior_Sum.trees
Description: Posterior tree set.
File: Anomaly_Zone.zip
Description: phylogeny annotated with anomaly zone pairs.
- az.tree #tree annotated with anomaly zones
Code/software
Software used include:
R and the (key) packages phytools, ape, BioGeoBears, optimx, BAMMTools, TESS, ggplot2 and the tidyverse suite, evoBiR, strap
BAMM
ASTRAL-III
BEAST2 suite
IQ-TREE2
Access information
Data was derived from the following sources:
- Genbank