Skip to main content

Beyond cyanogenesis: Temperature gradients drive environmental adaptation in North American white clover (Trifolium repens L.)

Cite this dataset

Kuo, Wen-Hsi et al. (2024). Beyond cyanogenesis: Temperature gradients drive environmental adaptation in North American white clover (Trifolium repens L.) [Dataset]. Dryad.


Species that repeatedly evolve phenotypic clines across environmental gradients have been highlighted as ideal systems for characterizing the genomic basis of local environmental adaptation. However, few studies have assessed the importance of observed phenotypic clines for local adaptation: conspicuous traits that vary clinally may not necessarily be the most critical in determining local fitness. The present study was designed to fill this gap, using a plant species characterized by repeatedly-evolved adaptive phenotypic clines. White clover is naturally polymorphic for its chemical defense cyanogenesis (HCN release with tissue damage); climate-associated cyanogenesis clines have evolved throughout its native and introduced range worldwide. We performed landscape genomic analyses on 415 wild genotypes from 43 locations spanning much of the North American species range to assess the relative importance of cyanogenesis loci vs. other genomic factors in local climatic adaptation. We find clear evidence of local adaptation, with temperature-related climatic variables best describing genome-wide differentiation between sampling locations. The same climatic variables are also strongly correlated with cyanogenesis frequencies and gene copy number variations (CNVs) at cyanogenesis loci. However, landscape genomic analyses indicate no significant contribution of cyanogenesis loci to local adaptation. Instead, several genomic regions containing promising candidate genes for plant response to seasonal cues are identified — some of which are shared with previously-identified QTLs for locally-adaptive fitness traits in North American white clover. Our findings suggest that local adaptation in white clover is likely determined primarily by genes controlling the timing of growth and flowering in response to local seasonal cues. More generally, this work suggests that caution is warranted when considering the importance of conspicuous phenotypic clines as primary determinants of local adaptation.

README: Beyond cyanogenesis: Temperature gradients drive environmental adaptation in North American white clover (Trifolium repens L.)

The dataset includes the vcf files using in this study, which are the documentations of the genetic polymorphism of the 415 genotypes from the 43 locations.

Description of the data and file structure

bwa_wild_DP0_diploid_hardfilter_miss0.25_maf0.05_GT.vcf.gz - A vcf file including all the polymorphic sites after filtering. See methods for detailed filtering conditions.

bwa_wild_DP0_diploid_hardfilter_miss0.25_maf0.05_GT_imputed.vcf.gz - A vcf file with imputations for the missing data in bwa_wild_DP0_diploid_hardfilter_miss0.25_maf0.05_GT.vcf.gz.

All the vcf files have not been subjected to LD pruning.

LFMM_output_20230606.txt - LFMM result output.


Sample Collection, DNA Extraction, and GBS Library Preparation

Using a nationwide network of K-12 science teachers, citizen scientists and colleagues, we obtained mature seeds or stolon cuttings for 419 wild white clover accessions across 43 locations in North America during the growing seasons of 2014-2017. Each location was represented by 6 to 11 accessions (individual genotypes), and latitude and longitude were recorded for each sample. Seeds and stolon cuttings were cultivated in the greenhouse at Washington University in St. Louis under standard greenhouse conditions (see Wright et al. 2022). Genomic DNA was extracted from young leaves using a standard DNA extraction protocol (Whitlock et al., 2008). Extraction quantity and quality were assessed using a NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer, and Qubit™ dsDNA HS Assay Kits.

Genotyping-by-sequencing (GBS) libraries were prepared following Elshire et al. (2011) with ApeKI methylation sensitive restriction enzyme. Barcoding and protocol modifications are described in Olsen et al. (2021). Paired-end sequencing (150-bp reads) was performed using the Illumina Hi-Seq 2500 platform (Novogene Corp., Chula Vista, CA, USA).

Read Mapping, SNP Calling, and SNP Filtering 

Raw GBS reads were demultiplexed with SABRE ( and adaptor-trimmed with CUTADAPT (Martin, 2011). The processed reads were mapped back to our white clover reference genome (Kuo et al., 2024) using BWA (Li & Durbin, 2009) with default paired-end settings. SNP calling from the alignments followed GATK best practices, with omission of the duplicated-read removal step as recommended for GBS data (Poplin et al., 2017). The output SNP dataset in vcf format underwent hard filtering (bcftools filter -e 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 || INFO/DP < 2500'). It was then filtered for sites with missing accessions < 0.25, missing sites < 0.35, and minor allele frequency > 0.05. Missing genotypes were imputed with Beagle v5.4 (Browning et al., 2018). A relaxed Hardy-Weinberg filter was applied to remove sites with skewed heterozygosity suggesting sequencing error (p < 1´10-50). This dataset was used for GWAS and genome-wide environmental association (GEA). For population structure analyses and genomic differentiation scans, three different levels of LD-pruning were performed using PLINK2 (--indep-pairwise 100kb 0.8; --indep-pairwise 200kb 0.5; --indep-pairwise 500kb 0.2) before analysis (Chang et al., 2015).


National Science Foundation, Award: IOS-1557770

National Science Foundation, Award: DEB-1601641

National Science Foundation, Award: DGE-1143954

Donald Danforth Plant Science Center, William H. Danforth Plant Science Graduate Research Fellowship

Ministry of Education