Data from: Climatic niche divergence and high-elevation adaptation promoted rapid diversification of Pimoa spiders in Pan-Himalaya
Data files
Jun 01, 2026 version files 55.74 MB
-
7896_orthologs.zip
10.19 MB
-
README.md
4.78 KB
-
Scripts_of_positively_selected_genes.zip
3.34 KB
-
Scripts_of_rapidly_evolving_genes.zip
4.17 KB
-
Supplementary_Data.zip
45.54 MB
Jun 22, 2026 version files 62.36 MB
-
7896_orthologs.zip
10.19 MB
-
README.md
18.83 KB
-
Scripts_of_positively_selected_genes.zip
3.34 KB
-
Scripts_of_rapidly_evolving_genes.zip
4.17 KB
-
Supplementary_Data.zip
45.54 MB
-
Supplementary_Tables.zip
6.61 MB
Abstract
The Pan-Himalayan region harbors exceptional biodiversity, yet the origins and underlying mechanisms driving this remarkable diversity remain poorly understood, especially among species-rich invertebrates. Pimoa spiders exhibit an intercontinental disjunct distribution across three major mountain regions: the Rockies, the Alps, and the Pan-Himalaya. Notably, in the Pan-Himalaya, Pimoa occurs at significantly higher elevations and exhibits far greater species diversity than in the other two regions, making it an ideal model for studying invertebrate diversification in this area. We integrated extensive genetic data (59 new transcriptomes and over 22,700 newly generated DNA sequences from 393 samples), along with distribution and climatic data, to explore the origins and drivers of the rich diversity of Asian Pimoa in the Pan-Himalaya. Our findings indicate that Pan-Himalayan Pimoa spiders originated from widely distributed ancestors in North America and Asia. During the Miocene, they dispersed southward from Northeast Asia to South China and the Pan-Himalaya, experiencing rapid diversification. Climatic niche modeling and ancestral reconstruction analyses revealed niche divergence between Pan-Himalayan Pimoa and their ancestors, particularly in elevation and temperature. Gene selection analyses showed that high-elevation adaptations primarily involve enhanced energy metabolism, hypoxia resistance, and DNA repair mechanisms. These results suggest that ecological niche differentiation, high-elevation adaptation, and rapid diversification during the Miocene are crucial factors shaping the rich diversity of Asian Pimoa in the Pan-Himalaya. Our study, which integrates genetic, distributional, and climatic data, provides a novel framework for understanding invertebrate diversification in the Pan-Himalaya.
Dataset DOI: 10.5061/dryad.k98sf7mkk
Description of the data and file structure
This dataset contains all the necessary data for reproducing the results of the research, as described in the manuscript with the name:
'Climatic niche divergence and high-elevation adaptation promoted rapid diversification of Pimoa spiders in Pan-Himalaya'
The data comprises raw data from de novo sequencing, published literature, and the corresponding code for gene selection analysis.
In summary, this dataset includes:
- two .zip files containing raw data were used for phylogenetic tree inference and gene selection analysis: Supplementary_Data.zip and 7896_orthologs.zip.
- two .zip files with the code for gene selection analysis: Scripts_of_positively_selected_genes.zip and Scripts_of_rapidly_evolving_genes.zip.
Files and variables
Supplementary_Data.zip:
This zip file includes three dataset folders for phylogenetic tree inference: Dataset I, Dataset II, and Dataset III.
Dataset I folder (Two datasets were used for phylogenetic tree inference based on 106 genes)
- Subfolder 1 Name: 103samples_Data Matrices
- Subfolder 1 Description: The dataset (106gene103samplesDNA.fasta) was used for phylogenetic tree inference based on 106 genes and 103 samples. And the partitioning schemes (106genes103samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 2 Name: 103samples_Sequence Files
- Subfolder 2 Description: Raw sequences of the 106 genes in 103 samples used for phylogenetic tree inference.
- Subfolder 3 Name: 290samples_Data Matrices
- Subfolder 3 Description: The dataset (106gene290samplesDNA.fasta) was used for phylogenetic tree inference based on 106 genes and 290 samples. And the partitioning schemes (106genes290samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 4 Name: 290samples_Sequence Files
- Subfolder 4 Description: Raw sequences of the 106 genes, 290 samples used for phylogenetic tree inference.
Dataset II folder (The dataset was used for phylogenetic tree inference based on 103 transcriptomes)
- Subfolder 1 Name: 103samples_Data Matrices
- Subfolder 1 Description: The dataset (874genes103samplesDNA.fasta) was used for phylogenetic tree inference based on 874 genes and 103 samples. And the partitioning schemes (874genes103samplesDNA_gene_partition.txt) for this dataset.
- Subfolder 2 Name: 103samples_Sequence Files
- Subfolder 2 Description: Raw sequences of the 874 genes 103 samples used for phylogenetic tree inference.
Dataset III folder (The dataset was used for total-evidence tree inference based on 34 extant species, 3 fossil pimoids in Baltic amber)
- Subfolder 1 Name: 37samples_Data_Matrices
- Subfolder 1 Description: The dataset was used for total-evidence tree inference based on 30 genes, 34 samples (pimo_34spp30gene.phy), 78 morphological characters, 37 samples (pimo_morph2_1.phy), and 1 ordered characters, 36 samples (pimo_morph2_2_order.phy). And the partitioning schemes (pimo_TE0524.nex) for this dataset.
- Subfolder 2 Name: 37samples_Files
- Subfolder 2 Description: Raw files of the 30 genes, 34 samples, 78 morphological characters, 37 samples, and 1 ordered characters, 36 samples used for total-evidence tree inference.
7896_orthologs.zip:
This zip file includes 7896 orthologs for gene selection analysis.
- We used the translated coding sequences of nine Pimoid species, all with Busco transcriptome completeness scores > 90% and contig N50 > 1,500 bp. One-to-one orthologous genes were identified through the reciprocal best-hit method in BLASTP (E-value < 1E-10). P. gyaca was used as an anchor species. Nucleotide sequence alignments were obtained using MAFFT v7.455 with the G-INS-i method. Poorly aligned regions with gaps covering more than 10% of the sequence or a similarity score < 0.001 were trimmed using trimAL v1.4.rev15 (-gt 0.9 -st 0.001 -backtrans). We obtained these 7,896 orthologs.
Scripts_of_positively_selected_genes.zip:
This zip file includes two Python files:batch_codemlBSM.py and batch_codemlBSM0.py, used for screening positively selected genes in the manuscript.
- For positive selection analysis, the CODEML program from PAML was executed to calculate the significance of genes among high-elevation pimoids via the branch-site model.
Scripts_of_rapidly_evolving_genes.zip:
This zip file contains two Python files (batch_codemlBSM.py and batch_codemlBSM0.py) and one shell script (REG.sh), all used for screening rapidly evolving genes in the manuscript.
- To scan the rapidly evolving genes, we employed the branch model through CODEML in PAML.
Supplementary_Tables.zip:
This zip file contains four excel files:Sup-0003-appendixs3.xlsx, Sup-0004-appendixs4.xlsx, Sup-0005-appendixs5.xlsx and Sup-0006-appendixs6.xlsx.
Sup-0003-appendixs3.xlsx: The phylogenetic supplementary table contains four sub-tables
Table S1.1 The molecular dataset I includes 290 samples.
Variables:
- Voucher: The specimen number.
- Taxon: The specimen's name and number.
- Species Name: The species name of specimen.
- Locality: The distribution location of specimens.
- Coordinate: The latitude and longitude of the specimen distribution location.
- Elevation: The elevation of the specimen distribution location.
- GenBank accession numbers: The GenBank accession numbers of specimen.
Table S1.2 The molecular dataset II includes 103 samples.
Variables:
- Taxon: The specimen's number.
- Voucher: Our specimen's number.
- Taxon: The specimen's name and number.
- Family: The family name of specimen.
- Species Name: The species name of specimen.
- Locality: The distribution location of specimens.
- Coordinate: The latitude and longitude of the specimen distribution location.
- Elevation: The elevation of the specimen distribution location.
- GenBank accession numbers: The GenBank accession numbers of specimen.
- Busco transcriptome: Our transcriptome BUSCO values.
Table S1.3 The 106 primers and conditions used in this study.
Variables:
- Gene: The name of genes.
- F/R: The forward or reverse of primers.
- Primer: The sequence of primers.
- Assay: The steps of PCR.
- Tm: The PCR conditions of primers.
- Product size: The sequence length of genes.
- References: The references of primers.
Table S1.4 The morphological character matrix.
Variables:
- Taxon: The specimen's name and number.
- 1~79: The 79 morphological character.
Sup-0004-appendixs4.xlsx: The biogeographic table contains four sub-tables
Table S2.1 The eight geographical areas used in historical biogeographical analyses.
Variables:
- Area Code: The code of eight geographical areas.
- Description: The description of eight geographical areas.
- Circumscription and Geography: The detailed description of eight geographical areas.
- References: The references of eight geographical areas.
Table S2.2 Dispersal probabilities between areas under three time slices from the BioGeoBEARS analysis.
Variables:
- Areas: The code of eight geographical areas.
- A: Nearctic.
- B: Italian Peninsula.
- C:Northeast Asia.
- D:South China.
- E:Hengduan Mountains.
- F:Tibet.
- G:Eastern Himalayas.
- H:Western Himalayas.
Table S2.3 Dispersal limitations between areas under three time slices from the BioGeoBEARS analysis.
Variables:
- Areas: The code of eight geographical areas.
- A: Nearctic.
- B: Italian Peninsula.
- C:Northeast Asia.
- D:South China.
- E:Hengduan Mountains.
- F:Tibet.
- G:Eastern Himalayas.
- H:Western Himalayas.
Table S2.4 A conservative estimation of Pimoa species richness for each clade in diversification-rate analyses.
Variables:
- Clades:The clades of Pimoa species.
- Sampled:The species number of sampled Pimoa spiders.
- Unsampled:The species number of unsampled Pimoa spiders.
- Estimated undiscovered:The number of estimated undiscovered species of Pimoa spiders.
- Sampling Rate:The rate of sampled species of Pimoa spiders.
Sup-0005-appendixs5.xlsx: The table for ecological niches and ancestral state reconstruction contains three sub-tables:
Table S3.1 The 579 occurrence records and climatic variable values of pimoids.
Variables:
- Voucher: The specimen number.
- Sample number: The specimen number.
- Species Name: The species name of specimen.
- Sample Name: The specimen's name and number.
- Locality: The distribution location of specimens.
- Coordinate: The latitude and longitude of the specimen distribution location.
- Elevation: The elevation of the specimen distribution location.
- Sources: The references of 579 occurrence records.
- Bio1: Annual Mean Temperature.
- Bio2: Mean Diurnal Range (Mean of monthly (max temp - min temp)).
- Bio3: Isothermality (Bio2/Bio7) (×100).
- Bio4: Temperature Seasonality (standard deviation ×100).
- Bio5: Max Temperature of Warmest Month.
- Bio6: Min Temperature of Coldest Month.
- Bio7: Temperature Annual Range (Bio5-Bio6).
- Bio8: Mean Temperature of Wettest Quarter.
- Bio9: Mean Temperature of Driest Quarter.
- Bio10: Mean Temperature of Warmest Quarter.
- Bio11: Mean Temperature of Coldest Quarter.
- Bio12: Annual Precipitation.
- Bio13: Precipitation of Wettest Month.
- Bio14: Precipitation of Driest Month.
- Bio15: Precipitation Seasonality (Coefficient of Variation).
- Bio16: Precipitation of Wettest Quarter.
- Bio17: Precipitation of Driest Quarter.
- Bio18: Precipitation of Warmest Quarter.
- Bio19: Precipitation of Coldest Quarter.
Table S3.2 The Pearson’s r correlation coefficients between 19 bioclimatic layers.
Variables:
- Bio1: Annual Mean Temperature.
- Bio2: Mean Diurnal Range (Mean of monthly (max temp - min temp)).
- Bio3: Isothermality (Bio2/Bio7) (×100).
- Bio4: Temperature Seasonality (standard deviation ×100).
- Bio5: Max Temperature of Warmest Month.
- Bio6: Min Temperature of Coldest Month.
- Bio7: Temperature Annual Range (Bio5-Bio6).
- Bio8: Mean Temperature of Wettest Quarter.
- Bio9: Mean Temperature of Driest Quarter.
- Bio10: Mean Temperature of Warmest Quarter.
- Bio11: Mean Temperature of Coldest Quarter.
- Bio12: Annual Precipitation.
- Bio13: Precipitation of Wettest Month.
- Bio14: Precipitation of Driest Month.
- Bio15: Precipitation Seasonality (Coefficient of Variation).
- Bio16: Precipitation of Wettest Quarter.
- Bio17: Precipitation of Driest Quarter.
- Bio18: Precipitation of Warmest Quarter.
- Bio19: Precipitation of Coldest Quarter
Table S3.3 The percent contribution and the gain of nine climatic variables in the MaxEnt analysis, as well as the AUC values of the predicted results for each species.
Variables:
- Species Name: The species name of specimen.
- N: The number of distribution points used for MaxEnt.
- AUC: The Area Under the Curve of 25 Pimoa spiders.
- AUCdiff: The AUC difference of 25 Pimoa spiders.
- Percent contribution: The percent contribution of nine climatic variables.
- The jackknife test of variable importance: The jackknife test of variable importance of nine climatic variables.
Sup-0006-appendixs6.xlsx: The table for comparative transcriptome analyses contains five sub-tables:
Table S4.1 The 507 positively selected genes (PSGs) in five high-elevation Pimoa spiders.
Variables:
- Gene_ID: The gene_ID of the 507 positively selected genes (PSGs) in five high-elevation Pimoa spiders.
- lnL: The log-likelihood value in positive selection analysis.
- lnL0: The log-likelihood of the null model in positive selection analysis.
- P-value: The P-value in positive selection analysis.
- FDR: The false discovery rates in positive selection analysis.
- w0: The dN/dS ratio for site class 0.
- w1: The dN/dS ratio for site class 1.
- propo1: The proportion of sites in site class 1.
- propo2: The proportion of sites in site class 2.
- wnull: The omega under the null model.
- site: The codon site of genes.
- NT_id: The Nucleotide database Accession ID.
- NT_score: The Nucleotide database Bit Score.
- NT_evalue: The Nucleotide database Expect Value.
- NT_description: The Nucleotide database subject description.
- NR_id: The Non-redundant protein database subject Accession ID.
- NR_score: The Non-redundant protein database Bit Score.
- NR_evalue: The Non-redundant protein database Expect Value.
- NR_description: The Non-redundant protein database subject description.
- swissprot_id: The Swiss-Prot Subject ID.
- swissprot_score: The Swiss-Prot Bit Score.
- swissprot_evalue: The Swiss-Prot Expect Value.
- swissprot_description: The Swiss-Prot subject description.
- Pfam_ID: The Protein Family Database Domain Accession ID.
- Pfam_Domain: The Protein Family Database Domain Name.
- Pfam_Description: The Protein Family Database Domain Functional Description.
- seed_eggNOG_ortholog: The evolutionary genealogy of genes: Non-supervised
- Orthologous Groups Ortholog Protein ID.
- best_tax_level: The Best Annotation Taxonomic Level.
- Preferred_name: The Preferred Protein Gene Name.
- GOs: Gene Ontology Terms.
- KEGG_ko: The KEGG Orthology identifier.
- KEGG_Pathway: The KEGG Pathway Accession & Name.
- COG Functional cat.: The Clusters of Orthologous Groups Functional Category Code.
- eggNOG free text desc.: The eggNOG Orthologous Group Free Text Description.
Table S4.2 The 434 rapidly evolving genes (REGs) in five high-elevation Pimoa spiders.
Variables:
- Gene_ID: The gene_ID of the 507 positively selected genes (PSGs) in five high-elevation Pimoa spiders.
- lnL: The log-likelihood value in positive selection analysis.
- lnL0: The log-likelihood of the null model in positive selection analysis.
- P-value: The P-value in positive selection analysis.
- FDR: The false discovery rates in positive selection analysis.
- omega: The Nonsynonymous substitution rate/Synonymous substitution rate ratio.
- w0: The dN/dS ratio for site class 0.
- w1: The dN/dS ratio for site class 1.
- diff_w1_w0:The difference between ω1 and ω0.
- Max_ds_one:The maximum dS for group one.
- Max_ds_two: The maximum dS for group two.
- N_dN_one: The number of non-synonymous substitutions on non-synonymous sites for group one.
- N_dS_one: The number of synonymous substitutions on non-synonymous sites for group one.
- N_dN_two: The number of non-synonymous substitutions on non-synonymous sites for group two.
- N_dS_two: The number of synonymous substitutions on non-synonymous sites for group two.
- NT_id: The Nucleotide database Accession ID.
- NT_score: The Nucleotide database Bit Score.
- NT_evalue: The Nucleotide database Expect Value.
- NT_description: The Nucleotide database subject description.
- NR_id: The Non-redundant protein database subject Accession ID.
- NR_score: The Non-redundant protein database Bit Score.
- NR_evalue: The Non-redundant protein database Expect Value.
- NR_description: The Non-redundant protein database subject description.
- swissprot_id: The Swiss-Prot Subject ID.
- swissprot_score: The Swiss-Prot Bit Score.
- swissprot_evalue: The Swiss-Prot Expect Value.
- swissprot_description: The Swiss-Prot subject description.
- Pfam_ID: The Protein Family Database Domain Accession ID.
- Pfam_Domain: The Protein Family Database Domain Name.
- Pfam_Description: The Protein Family Database Domain Functional Description.
- seed_eggNOG_ortholog: The evolutionary genealogy of genes: Non-supervised Orthologous Groups Ortholog Protein ID.
- best_tax_level: The Best Annotation Taxonomic Level.
- Preferred_name: The Preferred Protein Gene Name.
- GOs: Gene Ontology Terms.
- KEGG_ko: The KEGG Orthology identifier.
- KEGG_Pathway: The KEGG Pathway Accession & Name.
- COG Functional cat.: The Clusters of Orthologous Groups Functional Category Code.
- eggNOG free text desc.: The eggNOG Orthologous Group Free Text Description.
Table S4.3 The 36 significantly enriched GO terms of 507 positively selected genes (PSGs) in five high-elevation Pimoa spiders.
Variables:
- Term: The name of enriched GO terms.
- Database: The name of database.
- ID: The Gene Ontology GO term ID.
- Input numbers: The number of input genes.
- Background numbers: The number of background genes.
- P-value: The P-value in enrichment analysis.
- Corrected P-value: The Corrected P-value in enrichment analysis.
- Log of corrected P-value: The log of corrected P-value in enrichment analysis.
- Input: The name of input genes.
- Hyperlink: The hyperlink of GO terms.
Table S4.4 The 49 significantly enriched GO terms of 434 rapidly evolving genes (REGs) in five high-elevation Pimoa spiders.
Variables:
- Term: The name of enriched GO terms.
- Database: The name of database.
- ID: The Gene Ontology GO term ID.
- Input numbers: The number of input genes.
- Background numbers: The number of background genes.
- P-value: The P-value in enrichment analysis.
- Corrected P-value: The Corrected P-value in enrichment analysis.
- Log of corrected P-value: The log of corrected P-value in enrichment analysis.
- Input: The name of input genes.
- Hyperlink: The hyperlink of GO terms.
Table S4.5 The detailed gene functions of P. gyaca.
Variables:
- Gene_ID: The genes ID.
- NT_id: The Nucleotide database Accession ID.
- NT_score: The Nucleotide database Bit Score.
- NT_evalue: The Nucleotide database Expect Value.
- NT_description: The Nucleotide database subject description.
- NR_id: The Non-redundant protein database subject Accession ID.
- NR_score: The Non-redundant protein database Bit Score.
- NR_evalue: The Non-redundant protein database Expect Value.
- NR_description: The Non-redundant protein database subject description.
- swissprot_id: The Swiss-Prot Subject ID.
- swissprot_score: The Swiss-Prot Bit Score.
- swissprot_evalue: The Swiss-Prot Expect Value.
- swissprot_description: The Swiss-Prot subject description.
- Pfam_ID: The Protein Family Database Domain Accession ID.
- Pfam_Domain: The Protein Family Database Domain Name.
- Pfam_Description: The Protein Family Database Domain Functional Description.
- seed_eggNOG_ortholog: The evolutionary genealogy of genes: Non-supervised Orthologous Groups Ortholog Protein ID.
- best_tax_level: The Best Annotation Taxonomic Level.
- Preferred_name: The Preferred Protein Gene Name.
- GOs: Gene Ontology Terms.
- KEGG_ko: The KEGG Orthology identifier.
- KEGG_Pathway: The KEGG Pathway Accession & Name.
- COG Functional cat.: The Clusters of Orthologous Groups Functional Category Code.
- eggNOG free text desc.: The eggNOG Orthologous Group Free Text Description.
Changes after Jun 1, 2026: The newly uploaded file is a necessary supplementary tables for this paper.
