Skip to main content

Phylogenomics of scorpions reveal contemporaneous diversification of scorpion mammalian predators and mammal-active sodium channel toxins

Cite this dataset

Santibanez, Carlos et al. (2022). Phylogenomics of scorpions reveal contemporaneous diversification of scorpion mammalian predators and mammal-active sodium channel toxins [Dataset]. Dryad.


Scorpions constitute a charismatic lineage of arthropods and comprise more than 2,500 described species. Found throughout various tropical and temperate habitats, these predatory arachnids have a long evolutionary history, with a fossil record that began in the Silurian. While all scorpions are venomous, the asymmetrically diverse family Buthidae harbors nearly half the diversity of extant scorpions, and all but one of the 58 species that are medically significant to humans. However, the lack of a densely sampled scorpion phylogeny has hindered broader inferences of the diversification dynamics of scorpion toxins. As redress, we assembled a phylogenomic dataset of 100 scorpion venom transcriptomes and/or genomes, emphasizing the sampling of highly toxic buthid genera. To infer divergence times of venom gene families, we applied a phylogenomic node dating approach for the species tree in tandem with phylostratigraphic bracketing to estimate minimum ages of mammal-specific toxins. Our analyses establish a robustly supported phylogeny of scorpions, particularly with regard to relationships between medically significant taxa. Analysis of venom gene families shows that mammal-specific sodium channel toxins have independently evolved in five lineages within Buthidae. Temporal windows of mammal-specific toxin origins are correlated with the basal diversification of major scorpion mammal predators such as carnivores, shrews, bats and rodents. These results suggest an evolutionary model of relatively recent diversification of buthid sodium channel toxin (NaTx) homologs in response to diversification of scorpion predators.


Phylogenomics and dating

Scorpions were collected by hand in field theaters across Brazil, Egypt, Israel, and the US, commonly with the aid of ultraviolet lighting. Milking and dissection of venom glands, RNA extraction, and paired-end transcriptome sequencing was performed on the Illumina HiSeq 2500 platform for 42 species. New datasets were combined with 45 venom gland RNA-Seq datasets and one genome we previously generated. Orthologous loci were drawn from MCL clustering of 3564 orthogroups computed from a larger analysis of Chelicerata and outgroup taxa (Ballesteros and Sharma 2019). Three matrices were assembled with minimal taxon occupancy thresholds: Matrix 1 (at least 115 species), Matrix 2 (at least 109 species), and Matrix 3 (at least 103 species). Phylogenetic inference of these concatenated matrices was computed with IQ-TREE v. 1.6 (Nguyen et al. 2014) implementing the best-fitting amino acid substitution model per partition. Species trees were estimated using the coalescent method implemented in ASTRAL v.3 (Mirarab and Warnow 2015), using the collection of orthologous gene trees as inputs. Analysis of the smallest matrix (Matrix 3) was trialed using Phylobayes-mpi v. 1.7 (Lartillot and Philippe 2004) with four independent chains under the CAT + GTR + G4 model. Divergence time estimation was computed on Matrix 2 using the approximate likelihood calculation as implemented in Codeml and MCMCtree (both part of the PAML v. 4.8 software package (Yang 2007; dos Reis and Yang 2019).

Toxin evolution

Cysteine-stabilized a-helix and b-sheet fold (CSab), disulphide-directed beta-hairpin (DDH) and Inhibitor cystine knot (ICK) homologs from scorpion venom were retrieved from the complete dataset used in the scorpion phylogenetic analyses following our recent approaches (Santibáñez-López et al. 2018; Santibáñez-López et al. 2019b), and from UniProt. Gene trees were conducted using IQ-TREE for the entire dataset (1,353 CSab-ICK scorpion toxins, with 41 DDH scorpion toxins as outgroups), and for each of the four main clades recovered: (a) sodium channel toxins (NaTx); (b) potassium channel toxins (KTx); (c) chlorine channel toxins (ClTx); and (d) calcins. Comparative analyses between the subclades recovered within the NaTx included the search for repetitive motifs in their mature peptide using Multiple Em for Motif Elicitation (Bailey et al. 2015). The mature peptide of the two main clades within the NaTx (Aah2-like and Cn2-like) were separately analyzed using CLANS clustering (Frickey and Lupas 2004). Estimation of minimum ages for mammal-specific toxins was derived from molecular dating performed herein, using the ages of the most inclusive clades of taxa as minimum age estimates for gene age (phylostratigraphic bracketing). Divergence times for scorpion predators such Herpestidae (Carnivora), Chiroptera, Eulipotyphla and Rodentia were retrieved from a recent analysis of mammal diversification times (Upham et al. 2019) and compared against the mammal-specific toxin origins.