Data from: Phylogeny and biogeography of harmochirine jumping spiders (Araneae: Salticidae)
Data files
Jan 03, 2025 version files 602.19 MB
-
AdditionalAnalysesFiles.zip
430.44 MB
-
Azevedo_et_al_2024_HarmochirinePhylogenyData-20240213.zip
170.21 MB
-
Harmochirine_UCE_Tree_Results-Jan2024.nex
1.53 MB
-
README.md
5.95 KB
Abstract
We use ultraconserved elements (UCE) genomic data to study the phylogeny, age, and biogeographical history of harmochirine jumping spiders, a group that includes the species-rich genus Habronattus, whose remarkable courtship has made it the focus of studies of behaviour, sexual selection, and diversification. We recovered 1947 UCE loci from 43 harmochirine taxa and 4 outgroups, yielding a core dataset of 193 UCEs with at least 50% occupancy. Concatenated likelihood and ASTRAL analyses confirmed the separation of harmochirines into two major clades, here designated the infratribes Harmochirita and Pellenita. Most are African or Eurasian with the notable exception of a clade of pellenites containing Habronattus and Pellenattus of the Americas and Havaika and Hivanua of the Pacific Islands. Biogeographical analysis using the DEC model favours a dispersal of the clade's ancestor from Eurasia to the Americas, from which Havaika's ancestor dispersed to Hawaii and Hivanua's ancestor to the Marquesas Islands. Divergence time analysis on 32 loci with 85% occupancy, calibrated by fossils and island age, dates the dispersal to the Americas at approximately 5 to 7 million years ago. The explosive radiation of Habronattus perhaps began only about 4 mya. The phylogeny clarifies both the evolution of sexual traits (e.g. the terminal apophyses was enlarged in Pellenes and not subsequently lost) and the taxonomy. Habronattus is confirmed as monophyletic. Pellenattus is raised to the status of genus, and 13 species moved into it as new combinations. Bianor stepposus Logunov, 1991 is transferred to Sibianor, and Pellenes bulawayoensis Wesołowska, 2000 is transferred to Neaetha.
README: Data from: Phylogeny and biogeography of harmochirine jumping spiders (Araneae: Salticidae)
https://doi.org/10.5061/dryad.7wm37pw11
Description of the data, directory structure and files
This repository contains the data matrices, custom scripts, and software inputs and outputs used for the paper "Azevedo et al. 2024 Phylogeny and biogeography of harmochirine jumping spiders (Araneae: Salticidae). Molecular Phylogenetics and Evolution, 197, 108109." We generated Ultraconserved Elements (UCEs) sequencing data for 47 Harmochirine jumping spiders and complemented our dataset with 2 already published sequences from SRA archive. We also used bioinformatic processing to recover sequences for the common used markers (8S, 16SND1 and COI) and complemented this dataset with Sanger sequencing data for a few specimens available at GenBank and and BOLD repositories. For details on the DNA extraction, sequencing and bioinformatic processing for obtaining the matrices available her, please see the publication associated with this dataset.
The fie "Harmochirine_UCE_Tree_Results-Jan2024.nex" is a nexus formatted file that contains all the trees used for the figures and supplementary figures in the published paper, as well as trees bootstrap trees. The Title of the trees in the file have information on the dataset and method used for inferring the trees.
The file "Azevedo_et_al_2024_HarmochirinePhylogenyData-20240213.zip" contains three folders:
- "CustomScripts" folder contains two bash scrip files and one python script file used for paralogy/bad quality sequence filtering.
- "remove_paralogs_treeshrink.sh": This program prepares the input files to run TreeShrink and remove possible paralogs from alignments and outputs a folder with alignments with paralog sequences removed. Please see details at https://github.com/ghfazevedo/Poda?tab=readme-ov-file#remove_paralogs_treeshrinksh
- "find_long_branches.py": This program finds branches which length is bigger than a desired percentage of total three length. See https://github.com/ghfazevedo/Poda?tab=readme-ov-file#find_long_branchespy
- "screen_for_long_branches.sh": Iterates over the results of the remove_paralogs_treeshrink.sh using find_long_branches.py to search for long branches that passed through treeshrink run. Sometimes, when there is an long internal branch, that could indicate a gene duplication, TreeShrink does not remove the taxa. The output list the alignments that need to be checked, so you can decide to remove it or not. See https://github.com/ghfazevedo/Poda?tab=readme-ov-file#screen_for_long_branchessh
- "Data_Matrices" folder contains subfolders with alignments correspondent to each dataset used in the analyses mentioned in the published paper. The named of each folder correspond to the name of the dataset referenced in Material and Methods section of the the paper. The files within each folder are DNA sequence alignments in fasta or nexus format.
- "Inputs_Outputs_and_Logs" folder contains the input files, scripts, logs and output files for each analyses presented in the paper and organized in subfolder by software. For instance, the folder 'revbayes' contains the data, the scripts and main outputs for the Biogeographical analyses ran using the RevBayes software, as explained in the paper.
The file "AdditionalAnalysesFiles.zip" contains three subfolders with the input files, scripts, logs and output files for additional complementary analyses requested during the review process of the manuscript:
- "NoIslandCalibration" contains the files for the Bayesian dating analysis done with Beast without using Island ages for calibration. The pdf files are density plots of the posterior distribution of the parameters in the Beast output ".log" files present in this subfolder folder, as well as the ".log" files of the main analyses in the "Inputs_Outputs_and_Logs/beast" folder from the "Azevedo_et_al_2024_HarmochirinePhylogenyData-20240213.zip". Plots were made using Tracer to compare the posterior distribution of Clock Rates and MRCA ages (as specified in the file name and in the plot legend) between the additional analyses and the main dating analysis.
- "RelaxedClock" contains the files for the Bayesian dating analysis done with Beast using a relaxed clock on species tree. The pdf files are density plots of the posterior distribution of the parameters in the Beast output ".log" files present in this subfolder folder, as well as the ".log" files of the main analyses in the "Inputs_Outputs_and_Logs/beast" folder from the "Azevedo_et_al_2024_HarmochirinePhylogenyData-20240213.zip". Plots were made using Tracer to compare the posterior distribution of Clock Rates and MRCA ages (as specified in the file name and in the plot legend) between the additional analyses and the main dating analysis.
- "rjMCMC_DEC_J" contains the RevBayes script ("harmochirine.h0.unconstrainedTESTJ_rjMCMC.rev" file) for running the reversible jump MCMC testing for the inclusion of the "j" parameter in the DEC model, as well as the .log file ("harmochirine.h0.TestJ.model.log") with the posterior sample of parameters resultant from the run. The parameter "model_indicator" contain two values, where 1 correspond to the DEC model with the "j" and 2 corresponds to the model without "j". The posterior probability of each model where accessed using Tracer to visualize the posterior density and export the values to the file "J_probability.txt".