Data from: Climatic niche divergence and high-elevation adaptation promoted rapid diversification of Pimoa spiders in Pan-Himalaya
Data files
Jun 01, 2026 version files 55.74 MB
-
7896_orthologs.zip
10.19 MB
-
README.md
4.78 KB
-
Scripts_of_positively_selected_genes.zip
3.34 KB
-
Scripts_of_rapidly_evolving_genes.zip
4.17 KB
-
Supplementary_Data.zip
45.54 MB
Abstract
The Pan-Himalayan region harbors exceptional biodiversity, yet the origins and underlying mechanisms driving this remarkable diversity remain poorly understood, especially among species-rich invertebrates. Pimoa spiders exhibit an intercontinental disjunct distribution across three major mountain regions: the Rockies, the Alps, and the Pan-Himalaya. Notably, in the Pan-Himalaya, Pimoa occurs at significantly higher elevations and exhibits far greater species diversity than in the other two regions, making it an ideal model for studying invertebrate diversification in this area. We integrated extensive genetic data (59 new transcriptomes and over 22,700 newly generated DNA sequences from 393 samples), along with distribution and climatic data, to explore the origins and drivers of the rich diversity of Asian Pimoa in the Pan-Himalaya. Our findings indicate that Pan-Himalayan Pimoa spiders originated from widely distributed ancestors in North America and Asia. During the Miocene, they dispersed southward from Northeast Asia to South China and the Pan-Himalaya, experiencing rapid diversification. Climatic niche modeling and ancestral reconstruction analyses revealed niche divergence between Pan-Himalayan Pimoa and their ancestors, particularly in elevation and temperature. Gene selection analyses showed that high-elevation adaptations primarily involve enhanced energy metabolism, hypoxia resistance, and DNA repair mechanisms. These results suggest that ecological niche differentiation, high-elevation adaptation, and rapid diversification during the Miocene are crucial factors shaping the rich diversity of Asian Pimoa in the Pan-Himalaya. Our study, which integrates genetic, distributional, and climatic data, provides a novel framework for understanding invertebrate diversification in the Pan-Himalaya.
Dataset DOI: 10.5061/dryad.k98sf7mkk
Description of the data and file structure
This dataset contains all the necessary data for reproducing the results of the research, as described in the manuscript with the name:
'Climatic niche divergence and high-elevation adaptation promoted rapid diversification of Pimoa spiders in Pan-Himalaya'
The data comprises raw data from de novo sequencing, published literature, and the corresponding code for gene selection analysis.
In summary, this dataset includes:
- two .zip files containing raw data were used for phylogenetic tree inference and gene selection analysis: Supplementary_Data.zip and 7896_orthologs.zip.
- two .zip files with the code for gene selection analysis: Scripts_of_positively_selected_genes.zip and Scripts_of_rapidly_evolving_genes.zip.
Files and variables
Supplementary_Data.zip:
This zip file includes three dataset folders for phylogenetic tree inference: Dataset I, Dataset II, and Dataset III.
Dataset I folder (Two datasets were used for phylogenetic tree inference based on 106 genes)
- Subfolder 1 Name: 103samples_Data Matrices
- Subfolder 1 Description: The dataset (106gene103samplesDNA.fasta) was used for phylogenetic tree inference based on 106 genes and 103 samples. And the partitioning schemes (106genes103samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 2 Name: 103samples_Sequence Files
- Subfolder 2 Description: Raw sequences of the 106 genes in 103 samples used for phylogenetic tree inference.
- Subfolder 3 Name: 290samples_Data Matrices
- Subfolder 3 Description: The dataset (106gene290samplesDNA.fasta) was used for phylogenetic tree inference based on 106 genes and 290 samples. And the partitioning schemes (106genes290samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 4 Name: 290samples_Sequence Files
- Subfolder 4 Description: Raw sequences of the 106 genes, 290 samples used for phylogenetic tree inference.
Dataset II folder (The dataset was used for phylogenetic tree inference based on 103 transcriptomes)
- Subfolder 1 Name: 103samples_Data Matrices
- Subfolder 1 Description: The dataset (874genes103samplesDNA.fasta) was used for phylogenetic tree inference based on 874 genes and 103 samples. And the partitioning schemes (874genes103samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 2 Name: 103samples_Sequence Files
- Subfolder 2 Description: Raw sequences of the 874 genes 103 samples used for phylogenetic tree inference.
Dataset III folder (The dataset was used for total-evidence tree inference based on 34 extant species, 3 fossil pimoids in Baltic amber)
- Subfolder 1 Name: 37samples_Data_Matrices
- Subfolder 1 Description: The dataset was used for total-evidence tree inference based on 30 genes, 34 samples (pimo_34spp30gene.phy), 78 morphological characters, 37 samples (pimo_morph2_1.phy), and 1 ordered characters, 36 samples (pimo_morph2_2_order.phy). And the partitioning schemes (pimo_TE0524.nex) for this dataset.
- Subfolder 2 Name: 37samples_Files
- Subfolder 2 Description: Raw files of the 30 genes, 34 samples, 78 morphological characters, 37 samples, and 1 ordered characters, 36 samples used for total-evidence tree inference.
7896_orthologs.zip:
This zip file includes 7896 orthologs for gene selection analysis.
- We used the translated coding sequences of nine Pimoid species, all with Busco transcriptome completeness scores > 90% and contig N50 > 1,500 bp. One-to-one orthologous genes were identified through the reciprocal best-hit method in BLASTP (E-value < 1E-10). P. gyaca was used as an anchor species. Nucleotide sequence alignments were obtained using MAFFT v7.455 with the G-INS-i method. Poorly aligned regions with gaps covering more than 10% of the sequence or a similarity score < 0.001 were trimmed using trimAL v1.4.rev15 (-gt 0.9 -st 0.001 -backtrans). We obtained these 7,896 orthologs.
Scripts_of_positively_selected_genes.zip:
This zip file includes two Python files:batch_codemlBSM.py and batch_codemlBSM0.py, used for screening positively selected genes in the manuscript.
- For positive selection analysis, the CODEML program from PAML was executed to calculate the significance of genes among high-elevation pimoids via the branch-site model.
Scripts_of_rapidly_evolving_genes.zip:
This zip file contains two Python files (batch_codemlBSM.py and batch_codemlBSM0.py) and one shell script (REG.sh), all used for screening rapidly evolving genes in the manuscript.
- To scan the rapidly evolving genes, we employed the branch model through CODEML in PAML.
