Skip to main content

Evolution of the correlated genomic variation landscape across a divergence continuum in the genus Castanopsis

Cite this dataset

Chen, Xueyan et al. (2024). Evolution of the correlated genomic variation landscape across a divergence continuum in the genus Castanopsis [Dataset]. Dryad.


The heterogeneous landscape of genomic variation has been well documented in population genomic studies. However, disentangling the intricate interplay of evolutionary forces influencing the genetic variation landscape over time remains challenging. In this study, we assembled a chromosome-level genome for Castanopsis eyrei and sequenced the whole genomes of 276 individuals from 12 Castanopsis species, spanning a broad divergence continuum. We found highly correlated genomic variation landscapes across these species. Furthermore, variations in genetic diversity and differentiation along the genome were strongly associated with recombination rates and gene density. These results suggest that long-term linked selection and conserved genomic features have contributed to the formation of a common genomic variation landscape. By examining how correlations between population summary statistics change throughout the species divergence continuum, we determined that background selection alone does not fully explain the observed patterns of genomic variation; the effects of recurrent selective sweeps must be considered. We further revealed that extensive gene flow has significantly influenced patterns of genomic variation in Castanopsis species. The estimated admixture proportion correlated positively with recombination rate and negatively with gene density, supporting a scenario of selection against gene flow. Additionally, putative introgression regions exhibited strong signals of positive selection, an enrichment of functional genes, and reduced genetic burdens, indicating that adaptive introgression has played a role in shaping the genomes of hybridizing species. This study provides insights into how different evolutionary forces have interacted in driving the evolution of the genomic variation landscape.

README: Evolution of the correlated genomic variation landscape across a divergence continuum in the genus Castanopsis

We have submitted SNP data (267ind.Chr0.het.again.mac.recode.vcf.gz - 267ind.Chr12.het.again.mac.recode.vcf.gz,)

, chromosome information (id_conversion.tsv) and custom script (script.txt)

1,SNP data

The 13 zip files contain 52,385,983 high-quality single nucleotide polymorphisms (SNPs) called from 267 Castanopsis individuals based on whole genome resequencing data. They are in the VCF format, and were generated using the GATK software. Sample information can be found in table S1 in Supplementary Information of the manuscript. For detailed information on how the vcf files were created we refer to the Material and Methods section in the manuscript. The first 12 zipped vcf files (267ind.Chr0.het.again.mac.recode.vcf.gz - 267ind.Chr11.het.again.mac.recode.vcf.gz) contain SNPs called from the chromosomes 1-12, respectively. The 13th zipped vcf file (267ind.Chr12.het.again.mac.recode.vcf.gz) contain SNPs called from all other scaffolds that did not mapped to the 12 chromosomes.

2,chromsome information (id_conversion.tsv)

The 12 chromosomes were arranged based on their respective length, when presenting figures and tables in the manuscript. This file serves to provide information regarding the chromosome numbers in the VCF files with those referenced in the manuscript.

3, custom script (script.txt)

Custom scripts (Python and Bash) used for data analyses in this study.

The reference genome and short read sequences have been deposited in Genbank (under the accession number: PRJNA1097334 and PRJNA1097337) and NGDC (under accession number: PRJCA026947 and PRJCA026948).


Individuals (N = 267) were collected from 12 Castanopsis species, including: 21 C. carlesii; 25 C. fargesii; 25 C. eyrei; 24 C. lamontii; 28 C. fabri; 19 C. hystrix; 20 C. fordii; 26 C. tibetana; 10 C. chinensis; 23 C. sclerophylla; 24 C. jucunda; and 22 C. fissa (Supplementary Table S1). For each individual, genomic DNA was extracted from silica-dried leaves using a Plant DNA Kit (Bioteke, Beijing, China) and sequenced on the Illumina NovaSeq 6000 platform (150-bp paired-end reads) with a target coverage of 30×.

Raw sequencing data were cleaned using Trimmomatic v.0.38 (Bolger et al. 2014) to remove low quality sequences. Cleaned reads were then aligned to the C. eyrei reference genome using BWA v.0.7.15 (Li and Durbin 2010), and genotypes called using HaplotypeCaller implemented in GATK v.4.1 (Depristo et al. 2011). All individuals included in this study exhibited a high mapping rate (90.26%-98.32%), with a relative low mapping rate appearing to be individual-specific rather than species-specific (Supplementary Table S1 and Fig S20), suggesting that there is no species-specific bias due to divergence from the reference. These results suggested that the effects of reference bias were likely minimal in this study. To further minimize bias in SNP and genotype calling, SNPs that met any of the following conditions were discarded: (1) located within repetitive regions of the C. eyrei reference genome; (2) more than two alleles present; (3) sequencing depth > 100 or < 5; (4) missing rate ≥ 0.3; (4) heterozygosity rate (proportion of heterozygotes among all genotypes) > 0.5; (6) indels. Additionally, only homozygous genotypes supported by ≥ 4 reads were considered. For heterozygous genotypes, the minor allele was required to be supported by ≥ 2 reads, and the read ratio (number of reads supporting the minor allele/the number of reads supporting the major allele) was required to be > 0.1 and < 0.9.


Guangdong Flagship Project of Basic and Applied Basic Research, Award: 2023B0303050001

Guangdong Basic and Applied Basic Research Foundation, Award: 2023A1515110098

China Postdoctoral Science Foundation, Award: 2022M713197

Guangdong Natural Science Funds for Distinguished Young Scholar, Award: 2018B030306040

Guangdong Science and Technology Plan Project, Award: 2023B1212060046

Guangdong Dinghushan National Scientific Forest Ecosystem Observation and Research Field Station