Reduced evolutionary constraint accompanies ongoing radiation in deep-sea anglerfishes

Data files

Nov 26, 2024 version files 15.18 GB

dryad_package_Sept25_2024.zip

15.18 GB
README.md

16.34 KB

Abstract

Colonization of a novel habitat is often followed by phenotypic diversification in the wake of ecological opportunity. However, some habitats should be inherently more constraining than others if the challenges of that environment offer few evolutionary solutions. We examined this push-and-pull on macroevolutionary diversification following habitat transitions in the anglerfishes (Lophiiformes). We present a new phylogeny with unprecedented taxonomic sampling (1,092 loci and ~38% of species), combined with three-dimensional phenotypic data from museum specimens. We used these datasets to examine the tempo and mode of phenotypic diversification. The deep-sea pelagic anglerfishes originated from a benthic ancestor and shortly after experienced rapid lineage diversification rates. This transition incurred shifts towards larger jaws, smaller eyes, and a more laterally compressed body plan. Despite these directional trends, this lineage still evolved high phenotypic disparity in body, skull and jaw shapes. In particular, bathypelagic anglerfishes show high variability in body elongation while benthic anglerfishes are constrained around optimal shapes. Within this radiation, rates of phenotypic evolution were highest among recently diverged lineages, especially those that deviated from the archetypical globose body plan. Taken together, these results demonstrate that spectacular evolutionary radiations can unfold even within environments with few ecological resources and demanding physiological challenges.

https://www.nature.com/articles/s41559-024-02586-3

Dryad package prepared by Elizabeth Miller on September 25 2024

Contact with questions: lizmiller2633@gmail.com

All R scripts are designed to work with the R project "comparative_methods_jan2024.Rproj." Open this file first!
The R project keeps all folders within a common directory so the scripts can all "talk" to each other

Description of files:

SUPPLEMENTARY APPENDICES

These appendices are referred to (and described) within the journal article

Appendix A1: Additional phylogeny figures (all in single PDF)

Figure A1: Maximum likelihood phylogram including all individuals used for quality control (validating species identifications).

Figure A2: Maximum likelihood phylogram made using IQ-TREE using set of representative individuals for each species.

Figure A3: Coalescent phylogram made using ASTRAL using set of representative individuals for each species.

Figure A4: Comparing the topology from IQ-TREE and ASTRAL.

Figure A5: RelTime tree with 95% confidence intervals.

Appendix A5: Disparity by habitat, full pairwise comparisons (Excel document)

MOLECULAR ALIGNMENTS AND PHYLOGRAMS

Folder "molecular_alignments_and_genetrees" contains aligned molecular data and gene trees used for ASTRAL

Zip folder "final_gene_alignments" contains fasta-formatted alignments for each gene. Genes are named according to the format "E" and a number, following Hughes et al. 2021 (doi: https://doi.org/10.1111/1755-0998.13287)

Zip folder "final_gene_trees" contains gene trees constructed using IQ-TREE from the final_gene_alignments fasta files. These gene trees were used to construct the ASTRAL coalescent consensus tree.

"concatenated_alignment.out" is the concatenated alignment of all gene trees in fasta format that was used for IQ-TREE analyses.

Folder "phylograms_before_timecalib" contains phylogeny files with branch lengths scaled in substitution units. Note that trees are rooted and tip labels have been renamed for consistency prior to time calibration

"IQtree_main_rooted_renamed.tre" is the phylogram constructed using IQ-TREE (using concatenated gene alignment). Support values are visible in Appendix A1: Figure A2.

"ASTRAL_main_rooted_renamed.tre" is the phylogram constructed using ASTRAL (coalescent approach based on individual gene trees). Support values are visible in Appendix A1: Figure A3.

PHYLOGENETIC COMPARATIVE METHODS SCRIPTS AND INPUTS

INPUT TREES FOR COMPARATIVE ANALYSES

Folder "trees_for_pcms" contains all trees needed as input to replicate comparative analyses.

Subfolder "main/with_outgroups/" contain newick-formatted time-calibrated trees including outgroups (not for use with comparative methods)

Subfolder "main/no_outgroups/" contains newick tree files without outgroups (Lophiiformes only) used for comparative methods

Subfolder "main/taxonomic_inflation_pruned/" were used in alternative MiSSE analyses

Subfolder "trimmed_nexus_for_BayesTraits" contains nexus-formatted files with trees trimmed for use with body shape or CT scan datasets. Used as input with BayesTraits analyses

ANCESTRAL HABITATS USING BIOGEOBEARS

Folder "biogeo_depth" contains all scripts to replicate BioGeoBEARS analyses (Fig. 2A), as well as all outputs of BioGeoBEARS

R script "run_habitat_biogeo.R" fits the alternative models on eight different input trees. The results output folder needs to be set manually according to each input tree.

There are eight subfolders that contain inputs/outputs for each of the eight trees. Analyses are run using the same R script, the only thing that changes is the working directory for each tree.

LINEAGE DIVERSIFICATION RATES WITH MISSE

Folder "MiSSE" contains all scripts, inputs and outputs of MiSSE analyses (Fig. 2B)

Subfolder "scripts" contains all relevant scripts:

Script "run all MiSSE analyses.R" is the primary script for these analyses. It is designed to fit all MiSSE models on 16 trees (8 unpruned and 8 pruned for taxonomic inflation).

This script reads in accessory files: 
	"misse_order_list_loph.csv" tells the script what order to use each of the input trees, and the output folder for each tree's analyses
	The R script "fit_misse_models.R" fits the 10 alternative MiSSE models and is designed so models are fit in parallel.

Subfolder "results" contains 16 folders for the outputs of the 16 input trees. All fit MiSSE models, MarginRecon objects, and model-averaged rates are found here.

PHENOTYPIC DATASETS

Folder "prep_scripts/prep_morphology/" contains phenotypic datasets and related preparation files

Subfolder "body_shape/" contains a set of scripts and files needed to go from raw data to size-corrected species means appropriate for comparative methods.

Script "prep body shape species means log shape ratios.R" does the following:

	(1) size-correction with log-shapes ratios
	
		The input file "loph_body_shape_modified.csv" contains raw measurements for all individuals in millimeters. MUSEUM INFORMATION FOR THESE SPECIMENS CAN BE FOUND HERE.
		
		The script outputs a size-corrected file called "loph_body_shape_allindiv_sizecorrected_logshaperatios.csv" that is used in later steps.
		
	(2) outlier detection
	
		Script outputs a file "loph_body_shape_allindiv_sizecorrected_logshaperatios_outlierflags.csv" that contains flagged measurements
			
			This output was checked by eye. Flagged measurements were removed and the new input was created for downstream steps: "loph_body_shape_allindiv_sizecorrected_logshaperatios_outliersremoved.csv"
	
	(3) take species means and calculate intraspecific variation
	
		Script outputs the "final" version of the data ready for comparative analyses: "loph_body_shape_logshaperatios_speciesmeans.csv"
		
	(4) statistically compare measurement error

		subfolder "standard_error_anova/" contains outputs of posthoc tests comparing standard error by suborder for each of the 10 traits
		

Script "remove_allometry.R" does the following:
	
	(1) take species means of log-transformed variables
	
		This part of the script uses some outputs from the previous script "prep body shape species means log shape ratios.R" so this script should be run first
	
		I separately exported species means of eye diameter without any size correction to use in a sensitivity analysis ("eye_diameter_spmeans_NOSIZECORRECTION.csv")
	
	(2) run a PGLS to test for an effect of allometry
	
		This is done eight times for the eight alternative input trees.
	
	(3) take residuals of the PGLS as a means of size correction and export for use with comparative methods
	
		Eight versions of the data were output corresponding to PGLS using the eight alternative trees. These are stored in the subfolder "body_shape/phylo_residual_data/"

The scripts beginning with "prep_input_files_" were used to generate PC axes for use with multivariate analyses of shape evolution (i.e. BayesTraits) for the three datasets (body shape, skull and jaw shapes)

The two versions of the scripts that deal with CT scans ("wholeskull" and "jaws") read in the landmark coordinates found in the file: "phenotypic_datasets/landmark_files/Loph_coords_trimmed.csv". They then perform local superimposition procedure of Rhoda et al. 2021. Based on scripts written by Kory Evans

Folder "phenotypic_datasets/" contains inputs and outputs of the previous folder's prep scripts:

Subfolder "landmark_files" contains:

 Raw landmark coordinates placed on CT scans ("Loph_coords_trimmed.csv"). Each column is an x, y or z coordinate (in that order) for landmarks 1-145. This is the standard format readable in the geomorph R package
 
 Slide files (subfolder "slides") needed to perform local procrustes superimposition of Rhoda et al. 2021. This is performed within the script: "prep_scripts/prep_morphology/prep_input_files_wholeskull.R" and the "jaw" equivalent

Subfolders "coord_file_jaws/" and "coord_file_wholeskull/" contain the combined output of local procrustes superimposition that can be used for comparative analyses. These files were generated by the script: "prep_scripts/prep_morphology/prep_input_files_wholeskull.R" and the "jaw" equivalent

Subfolder "PC_scores_bodyshape/" contains the BayesTraits inputs for body shape analyses using both size-correction methods. Files were generated by script "prep_scripts/prep_morphology/prep_input_files_bodyshape_multivariate.R"

Subfolders "PC_scores_wholeskull" and "PC_scores_jaws" contain BayesTraits inputs based on geometric morphometrics of CT scans after local superimposition. These are generated by the script: "prep_scripts/prep_morphology/prep_input_files_wholeskull.R" and the "jaw" equivalent

PHYLOMORPHOSPACE PLOTS

Folder "morphospaces" contains scripts, inputs and outputs needed to replicate phylomorphospace plots (Fig. 3)

Script "morphospace_polygons_bodyshape.R" performs analyses for linear measurements of body shape, making alternative plots for variables size-corrected using log-shapes ratios (FIg. 3) or phylogenetic residuals (Extended Data Fig. 3)

Script "morphospace_polygons_skull_and_jaws.R" performs analyses for geometric morphometrics based on CT scans (skulls and jaw shape)
These require the coord files as input, found here: "phenotypic_datasets/coord_file_jaws/" and "phenotypic_datasets/coord_file_wholeskull/""

UNIVARIATE MODEL FITTING WITH OUWIE

Folder "ouwie_univariate" contains scripts, inputs and outputs related to univariate OUwie model fitting of body shape (Fig. 6)

Scripts "ouwie_logshapesratios.R" and "ouwie_phyloresid.R" are the primary scripts to run analyses. They are separated by treatment of allometry (logshapes = keeps allometry, phyloresid = eliminates allometry)
These scripts do the following:
(1) label trees with BioGeoBEARS results (this only needs to be done once, so I did it within the "logshapesratios" script version)
(2) adjust for negative values of body shape variables after size correction
(3) fit 6 alternative OUwie models on all 10 traits individually and using 8 alternative input trees
(4) create summary tables of the results
(these summary tables begin with the names "best_model_params" and "AICc_summary_table")

Input data files:

"loph_body_shape_logshaperatios_speciesmeans_withtaxonomy_withhabitat.csv" = species mean and standard error after size correction using log-shapes ratios for 10 linear body shape traits, and the species' habitat regime.
	This file is equivalent to that in the folder "prep_scripts/prep_morphology/body_shape/loph_body_shape_logshaperatios_speciesmeans.csv" but with taxonomy and habitat information added manually
	Standard error is input into OUwie model fitting. Note that standard error is only possible to include when size correcting with log-shapes ratios (see Methods section)
	
"loph_body_shape_logshaperatios_speciesmeans_withtaxonomy_withhabitat_NONEGATIVES.csv" is created within the script "ouwie_logshapesratios.R" and represents the data after correcting for negative values.

Inputs of the phylogenetic residual version of this script are contained in the folder "prep_scripts/prep_morphology/"

Outputs of these scripts are included within the subfolders:

Subfolder "biogeo_trees_for_OUwie/" contains trees with the nodes labeled with the most likely habitat states inferred from BioGeoBEARS. 
	These are generated within the "ouwie" scripts and are later input within OUwie model fitting analyses. Note that 1=shallow benthic, 2=shallow+deep benthic, 3=deep benthic, 4=bathypelagic

Subfolder "model_fit_objects" contains all outputs from OUwie analyses including model fit R objects. Outputs are organized according to the input tree and treatment of allometry.

MULTIVARIATE BAYESTRAITS ANALYSES

Folder "bayestraits_loph_jan2024" contains the following:

All inputs and outputs of BayesTraits runs are contained in the compressed folder "BayesTraits_runs":

Subfolder "model_command_inputs/" contains input files with BayesTraits commands to fit alternative models, including local tree transformations:

	BayesTraits is run using an executable that can be downloaded from: http://www.evolution.reading.ac.uk/BayesTraitsV4.0.1/BayesTraitsV4.0.1.html
	The multi-thread version 4 executable was used. 
	In order to use the multithread version on a cluster, an additional input file is needed that contains the program commands. 
		These files are included in the subfolder "model_command_inputs/" and are divided by model type and whether body shape or ct scans were used (this changes the tips in the tree which are needed for local transformations)
		
The outputs of BayesTraits runs are organized in the following hierarchical way:

	(1) Whether or not effects of allometry were removed during size correction:
	
		Subfolder: "allometry_removed/" contains body shape analyses with measurements size-corrected using PGLS residuals
	
		Subfolder "allometry_not_removed/" contains all other analyses
	
	(2) Whether the input tree was time-calibrated using MCMCtree or RelTime:

		e.g., "allometry_not_removed/" folder contains two subfolders: "IQtree_noplecto_MCMCtree" and "IQtree_noplecto_RelTime"

	(3) Phenotypic dataset: body, skull or jaw shape (the latter two are based on CT scans)

		e.g. "allometry_not_removed/IQtree_noplecto_MCMCtree/" contains three subfolders: "bodyshape", "jaws", "wholeskull"
		The "allometry_removed" folder only contain body shape analyses

	(4) Evolutionary model- 8 options

		e.g., "allometry_not_removed/IQtree_noplecto_MCMCtree/bodyshape/" contains the following subfolders:
			1. SR_BM = single rate Brownian motion
			2. SR_delta = single rate early burst ("delta" tree transformation)
			3. SR_SP_OU = single rate single-optimum OU
			4. SR_MP_OU = single rate multi-optima OU with optimal shapes associated with habitat
			5. VR_BM = variable rate Brownian motion
			6. VR_delta = variable rate early burst ("delta" tree transformation)
			7. VR_SP_OU = variable rate single-optimum OU
			8. VR_MP_OU = variable rate multi-optima OU with optimal shapes associated with habitat
		
			The best-fitting models (usually VR MP OU) were run 2-3 times to confirm convergence
		
			These folders contain the inputs (PC scores and tree) and outputs of each BayesTraits "run". Please consult the BayesTraits manual for help interpreting these output files: http://www.evolution.reading.ac.uk/BayesTraitsV4.0.1/Files/BayesTraitsV4.0.0-Manual.pdf

Subfolder "process_output" contains R scripts for processing output of BayesTraits runs:

	subfolder "scripts" contains the following:
	
		(1) "1_compare_BT_modelfits.R": uses BayesFactors to compare the fit of models. Based on a similar script found here: https://github.com/EllenJCoombs/Cetacean_cranial_evolution
			
			Plots of BayesFactor comparisons are output to the subfolder: "process_output/bayes_factor_cubes/"
			
			This script also plots the posterior distributions of the OU parameter for multi-optima models. These are output to the folder: "process_output/OU_dist/"

		(2) "2_test_convergence.R". Tests convergence of multiple runs of the same models. Based on a similar script found here: https://github.com/EllenJCoombs/Cetacean_cranial_evolution
		
		(3) "3_run get mean scaled bl function across analyses.R". This script reads in the "Output.trees" file of a variable-rates BayesTraits run and calculates the mean scalar for each branch across the posterior distribution. This is done to make plotting easier.
			
			Reads a custom utility function "get_mean_scaled_bl_function.R" which takes the mean branch lengths. This is a simple task made complicated by the sheer number of trees in the posterior distribution
			
			The result is output to the folder: "process_output/mean_scaled_branch_length_trees/". I did this for two model types: VR MP OU and VR Brownian (shown in Extended Data Fig. 6)
			
		(4) "4_plot scaled branch lengths.R". This script creates the plots shown in Fig. 5 and Extended Data Fig. 6. These plots show how the scaled tree compares to the original (time-calibrated) tree. Takes as input the files in the folder "mean_scaled_branch_length_trees/"