Skip to main content

Data from: Population genomics reveal deep divergence and strong geographical structuring in the Hengduan Mountains

Cite this dataset

Fu, Peng-Cheng (2022). Data from: Population genomics reveal deep divergence and strong geographical structuring in the Hengduan Mountains [Dataset]. Dryad.


We used restriction site-associated DNA sequencing to generate 1,907 single nucleotide polymorphisms (SNPs) and four-kb of plastid sequence in species of the Gentiana hexaphylla complex (Gentianaceae). We performed genetic clustering with spatial and non-spatial models, phylogenetic reconstructions, and ancestral range estimation, with the aim of addressing the processes influencing the diversification of G. hexaphylla in the HM. Here, the SNP data and plastid sequence alignments are provided.


For RAD library construction and sequencing (Miller, Dunham, Amores, Cresko, & Johnson, 2007), each sample was digested with the restriction enzyme EcoRI followed by ligation of the P1 adapter by T4 ligase. Fragments were pooled, randomly sheared and size-selected to 350–550 bp. A second adapter (P2) was then ligated. The ligation products were purified and PCR-amplified, followed by gel purification and size selection for fragments in the range of 350–550 bp. Paired-end reads 150 bp in length were generated using the Illumina Novaseq 6000 (Tianjin, China).

Raw reads were filtered and trimmed with Trimmomatic v0.32 (Bolger, Lohse & Usadel, 2014) with default parameters to remove adaptor sequences and low-quality reads and sites, and then checked for quality with FastQC v0.11.2. We used Stacks v2.0 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011; Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) to identify orthologous loci across individuals. Clean sequences were de novo assembled using denovo_map, with a minimum stack depth of three (m = 3), and we allowed a range of different mismatches between stacks within and between individuals (M = n = 2, 3 or 4). At least 75% of individuals in a population were required to retain a locus (-r 0.75), and SNPs identified in all individuals with minor allele frequency (MAF) less than 5% were removed (--min-maf 0.05). SNPs with missing frequency of less than fifty percent among individuals (--max-missing 0.5) were retained using vcftools version 0.1.13 (Danecek et al., 2011). Linkage-disequilibrium (LD) SNP pruning was performed in vcftools to exclude variants from each pair closer than 100 bp (--thin 100). Heterogeneous loci were filtered out in TASSEL 5 (Bradbury et al., 2007) to exclude SNPs originating from different paralogs.

To obtain plastid sequences of each sample, clean reads were assembled using the GetOrganelle pipeline (Jin et al., 2018) with default parameters. We used the published plastome of G. hexaphylla (MG192305) (Sun et al., 2018) as the reference.  Sequences were aligned using MAFFT (Katoh, Misawa, Kuma, & Miyata, 2002)


National Natural Science Foundation of China, Award: 31600296

China Scholarship Council