Data from: Allopatric divergence and hybridization within Cupressus chengiana (Cupressaceae), a threatened conifer in the northern Hengduan Mountains of western China
Data files
Jul 23, 2020 version files 208 MB
Abstract
Having a comprehensive understanding of population structure, genetic differentiation and demographic history is important for the conservation and management of threatened species. High‐throughput sequencing (HTS) provides exciting opportunities to address a wide range of factors for conservation genetics. Here, we generated HTS data and identified 266,884 high‐quality single nucleotide polymorphisms from 82 individuals of Cupressus chengiana , to assess population genomics across the species' full range, comprising the Daduhe River (DDH), Minjiang River (MJR) and Bailongjiang River (BLJ) catchments in western China. admixture , principal components analysis and phylogenetic analyses indicated that each region contains a distinct lineage, with high levels of differentiation between them (DDH, MJR and BLJ lineages). MJR was newly distinguished compared to previous surveys, and evidence including coalescent simulations supported a hybrid origin of MJR during the Quaternary. Each of these three lineages should be recognized as an evolutionarily significant unit (ESU), due to isolation, differing genetic adaptations and different demographic history. Currently, each ESU faces distinct threats, and will require different conservation strategies. Our work shows that population genomic approaches using HTS can reconstruct the complex evolutionary history of threatened species in mountainous regions, and hence inform conservation efforts, and contribute to the understanding of high biodiversity in mountains.
Methods
We identified 266,884 high-quality SNPs from 82 individuals to assess population genomics of Cupressus chengiana across its full range.
To get a high-quality reference, single-molecule real-time (SMRT) sequencing was used to obtain the full-length transcriptome of C. chengiana. A total of 82 samples were collected for RNA-seq. We used bwa-mem to align the quality-filtered reads of each individual to the refences sequences. We used the “mpileup” command in SAMTOOLS to identify SNPs. Data were filtered with the following processes: SNPs with a mapping quality <30, a mapping depth <10, genotyping rate <50% per group, minor allele frequency (MAF) <5%, or in 5bps windows around any indel.