All about being old and shooting hairs: Clade age and urticating hair explain the patterns of diversification in tarantulas
Data files
Oct 31, 2023 version files 2.53 MB
-
README.md
-
Supplementary_material_S4-_Data_and_scripts.rar
Abstract
The extreme asymmetry of species richness distribution across the tree of life has always intrigued evolutionary biologists. Two competing explanations have been proposed to explain this pattern—the clade age hypothesis and diversification rate variation. While these two scenarios may not be mutually exclusive, to what extent time and diversification rates interact to explain species richness patterns remains understudied. Here, we investigate the relative influence of these two scenarios using tarantulas (Family: Theraphosidae) as a model. Tarantulas represent a speciose group of spiders found worldwide but exceptionally diverse in South America. These spiders show two distinct patterns of microhabitat use (ground-dwelling or arboreal) and defence strategies (presence or absence of urticating hairs). Using various trait-independent and dependent diversification models, we test the clade age hypothesis, the role of microhabitat, antipredator defence strategy and geography in influencing diversification rates. Our results suggest that clade age is the primary predictor of species richness distribution across the tarantula subfamilies. However, the presence of urticating hair probably disrupted this pattern in some clades by increasing the net diversification rates, not by increasing the speciation rate but by reducing the extinction rate.
README: All about being old and shooting hairs: Clade age and Urticating hair explain the patterns of diversification in tarantulas
https://doi.org/10.5061/dryad.dbrv15f76
This repository contains eight folders. Each folder represents a different analysis performed in the article. The raw data and the R script necessary to reproduce the results of an analysis are given in the respective folder.
Details of each folder are given below-
BEAST FOLDER
This folder contains the concatenated alignment of one nuclear and three mitochondrial markers in nexus format and the input XML file for BEAST used for inferring the chronogram.
BiSSE FOLDER
This folder has the binary discrete trait dataset for habitat and urtication in CSV format. The chronogram is given in Newick format and two R scripts are there for running the bisse analysis on urtication and habitat dataset. The names column in the CSV files represents the tip labels in the phylogeny and the states column represents the trait states (0-absent, 1-present).
Clade age testing folder
This folder contains the script and dataset for running phylogenetic regression between species richness and crown-group age of subfamilies. The species richness values and clade age data are given in the data1.csv file and the choronogram is in newick format.
GeoHiSSE folder
This folder contains the script and dataset for running the geohisse analysis to test if geographic ranges affect diversification rates. The geographic states are coded as a discrete binary character (American and non-American) in the geo.csv file. The names column in the CSV files represents the tip labels in the phylogeny and the states column represents the trait states (0-absent, 1-present).
HiSSE FOLDER
This folder contains the script and dataset for running the hisse analysis. This folder has the binary discrete trait dataset for habitat and urtication in CSV format. The chronogram is given in Newick format and two R scripts are there for running the hisse analysis on urtication and habitat dataset. The names column in the CSV files represents the tip labels in the phylogeny and the states column represents the trait states (0-absent, 1-present).
Incomplete taxon sampling simulations
This folder has two subfolders – habitat and urtication. Both habitat and urtication contain two subfolders – bisse and hisse. All the bisse and hisse folders further contain 10 subfolders named according to the phylogeny (phy1, phy2 etc.). Each of these folders contains the R script for simulating the respective trees and generating the trait dataset and then running bisse and hisse on these simulated datasets.
MiSSE and PGLS ANOVA folder
This folder contains the R script for running the misse analysis on the "theraphosidae1.newick" phylogeny. The output of misse is saved as "tip rates.csv". Then using the tip rate file and "testtree.newick" phylogeny you can run the PGLS ANOVA and boxplots using the subsequent R script.
Incomplete taxon sampling
This folder contains two subfolders- urtication and habitat. Both the subfolders are further divided into two subfolders- bisse and hisse. Each of these folders contains 10 sets of simulated phylogenies (named as sim 1, 2, 3..). Each sim# subfolder contains the codes for simulating the complete phylogeny. Then the trait states of the phylogeny have been saved in CSV format (states.csv) and sorted manually (state0.csv and state1.csv). Then subsequent proportions of taxa have been sampled separately from both the trait states to represent an incompletely sampled phylogeny and bisse or hisse was run on that incompletely sampled phylogeny.
Sensitivity analysis
This folder contains the data and codes for testing the effect of phylogenetic uncertainty on hisse and clade age analysis. Two subfolders are there- hisse and clade age.
The hisse folder contains 100 randomly sampled phylogenies, the trait dataset and the R script for running the hisse analysis. The "phy#.newick" files represent 100 randomly sampled phylogenies.The "hisse urtication script.R" is the script for running hisse on those 100 trees. The "phy#_hisse.csv" files are the results of each run (AIC stats and model performance). The "urt.csv" is the trait dataset needed to run hisse. The "compiled results.xlsx" file contains the Akaike weights of all models across 100 runs summarised in one file, whereas the "parameter estimates.xlsx" file contains the parameter values across 100 runs estimated according to the best-fit model. The "Script.R" file is the alternate script for the hisse analysis in case "hisse urtication script.R" does not work.
The clade age folder contains 100 randomly sampled phylogenies, the CSV files containing the clade age and species richness information and the R script for running phylogenetic regression. The "phy#.csv" files contain the crown group age (across 100 phylogenies) and species richness information for subsequent phylogenies. The "phy#.pdf" files represent the regression plots for 100 randomly sampled phylogenies. The analysis script can be found in "phylolm.R" file. The subfamily level phylogeny needed for this analysis is- "subfamily.newick". The other files represent the compiled results of regression across 100 trees.
Please contact the corresponding author for any questions regarding the Data and code.
**Aritra Biswas**
address: Centre for Ecological Sciences, Indian Institue of Science, Bangalore, India
email: aritra110@gmail.com
Methods
The dataset has been assembled by literature survey and Genbank mining. R scripts for analysis have been developed from the package manual PDFs.