Data from: Species diversification in the sky islands of southwestern China revealed by genomic, introgression and demographic analyses of Asian shrew moles
Data files
Aug 20, 2025 version files 74.90 GB
-
15_mitochondria_genes_concatenation.phy
546.36 KB
-
astral_input_gene_trees.treefile
46.24 MB
-
BPP_control_files.zip
10.49 KB
-
BPP_input_loci.phy
6.53 MB
-
consensus_seq_for_dfoil.zip
9.83 GB
-
iqtree_input_CDS_4dtv.fas
34.96 MB
-
mcmc_input_CDS_4dtv.phy
34.39 MB
-
mcmctree_input_cali.tree
205 B
-
NJ_IBS.txt
12.87 KB
-
psmc_fq_files.zip
36.53 GB
-
raxml_input_SNPs.phy
56.05 MB
-
README.md
2.10 KB
-
SNAPP_input.xml
186.49 KB
-
SNPs2CF_input_SNP_thin1000.phy
69.22 MB
-
SNPs2CF.csv
189.06 KB
-
Uropsilus_filtered_SNP.vcf.gz
28.30 GB
Abstract
The mountains of Southwest China, a global biodiversity hotspot, have a unique “sky island” landscape with high diversity of both ancient and recently formed species. While their distribution patterns offer significant insights into diversification processes, the complex geological and climatic history, combined with dynamic histories of gene flow in endemic taxa, makes unravelling this history challenging. This study focuses on Asian shrew moles (genus Uropsilus), an ancient group endemic to this region with an unresolved taxonomic system. By combining phylogenomic, introgression, and demographic history analyses, we investigated the historical patterns of species diversification in this genus. We detected phylogenetic discordances among rapidly diverged lineages, driven by incomplete lineage sorting, both recent and ancient gene flow, and ghost introgression. The gene flow patterns revealed strong genetic isolation in the Hengduan Mountains region, contrasted by more extensive dispersal or connectivity in areas to its east, while suggesting potential ring-like diversification around the Sichuan Basin. Demographic history indicated that rapidly diverged lineages south of the Yangtze River exhibited significantly different responses to climatic fluctuations compared to other lineages, with the East Asian monsoon likely driving their radiative differentiation and dispersal. Our study demonstrates the impacts of mountain uplift, climatic changes, and the connectivity of sky island refugia in shaping the diverse patterns of species differentiation and their distribution.
The data includes the input files used in the analysis of the study.
The following is the file structure and data description:
"Uropsilus_filtered_SNP.vcf.gz"
The whole-genome high-quality biallelic single-nucleotide polymorphisms (SNPs) of 39 samples.
"raxml_input_SNPs.phy"
The SNPs file used for constructing the maximum likelihood (ML) tree, converted to .phy format for input into RAxML 8.2.12.
"NJ_IBS.txt"
Pairwise estimates of identity-by-state (IBS) scores for all samples obtained by PLINK 1.90, used for constructing a neighbor-joining (NJ) tree with Phylip 3.698.
"iqtree_input_CDS_4dtv.fas"
The four-fold degenerate sites obtained from single-copy orthologous genes (CDS-4dtv) of 16 Uropsilus cryptic species and an outgroup Talpa occidentalis, used for constructing a ML tree with IQ-TREE 2.1.4.
"15_mitochondria_genes_concatenation.phy"
A concatenated dataset comprising 13 protein-coding genes and two ribosomal RNA (rRNA) genes from mitogenomes of all 39 samples, used for constructing a ML tree with RAxML 8.2.12.
"SNAPP_input.xml"
The file used for input into SNAPP of BEAST 2.6.6.
"mcmc_input_CDS_4dtv.phy"
The CDS-4dtv dataset same as "iqtree_input_CDS_4dtv.fas", but converted to .phy format for input into PAML-MCMCTree.
"mcmctree_input_cali.tree"
The ML tree constructed by CDS-4dtv dataset, used as input tree for PAML-MCMCTree.
"SNPs2CF_input_SNP_thin1000.phy"
The pruned SNPs used for calculating the Concordance Factors (CFs).
"SNPs2CF.csv"
The CFs data, input for SNaQ.
"astral_input_gene_trees.treefile" The gene trees for sliding windows constructed by IQTREE v2.4.0, which are prepared as input for species tree and concordance factors analyses in IQTREE v2.4.0 and ASTRAL v5.7.8.
"consensus_seq_for_dfoil.zip"
The genomic concordance sequences files used for DFOIL analysis.
"psmc_fq_files.zip"
The fastq files used for PSMC analysis.
"BPP_input_loci.phy" 1000 loci used for BPP analysis.
"BPP_control_files.zip" control files for each introgression model's analysis in BPP.
