Filtered SNP tables - Rangewide, Hamilton, Tejon, and Madera transects

Gugger, Paul 1 ; Fitz-Gibbon, Sorel 2 ; Albarrán-Lara, Ana2 ; Wright, Jessica3 ; Sork, Victoria2

Published Oct 20, 2020 on Dryad. https://doi.org/10.5061/dryad.5dv41ns4n

Data files

Oct 20, 2020 version files 13.29 MB

snps.ham80.forR

847.03 KB
snps.mad80.forR

1.35 MB
snps.tej80.forR

1.19 MB
SNPs80.forR

9.91 MB

Abstract

Understanding how the environment shapes genetic variation provides critical insight about the evolution of local adaptation in natural populations. At multiple spatial scales and multiple geographic contexts within a single species, such information could address a number of fundamental questions about the scale of local adaptation and whether or not the same loci are involved at different spatial scales or geographic contexts. We used landscape genomic approaches from three local elevational transects and range-wide sampling to 1) identify genetic variation underlying local adaptation to environmental gradients in the California endemic oak, Quercus lobata, 2) examine whether putatively adaptive SNPs show signatures of selection at multiple spatial scales, and 3) map putatively adaptive variation to assess the scale and pattern of local adaptation. Of over 10k single-nucleotide polymorphisms (SNPs) generated with genotyping-by-sequencing, we found signatures of natural selection by climate or local environment at over 600 SNPs (536 loci), some at multiple spatial scales across multiple analyses. Candidate SNPs identified with gene–environment tests (LFMM) at the range-wide scale also showed elevated associations with climate variables compared to the background at both range-wide and elevational transect scales with gradient forest analysis. Some loci overlap with those detected in other oak species, raising the question of whether the same loci might be involved in local climate adaptation in different congeneric species that inhabit different geographic contexts. Mapping landscape patterns of adaptive versus background genetic variation identified regions of marked local adaptation and suggests nonlinear association of candidate SNPs and environmental variables. Taken together, our results offer robust evidence for novel candidate genes for local climate adaptation at multiple spatial scales.

Sampling and processing. In fall of 2012, leaf samples from 436 adult trees were collected from throughout the range of valley oak and separated from each other by at least 50 m (Fig. 1; Table S1). Within this range-wide sampling, three areas were more intensively sampled along elevational gradients (subsets of the 436 total samples). The Hamilton transect up Mt. Hamilton spans 12 km and includes 30 samples from 383–1112 m elevation; the Madera transect from the eastern Central Valley near the town of Madera to the Sierra Nevada foothills spans 69 km and includes 45 samples from 78–607 m; and the Tejon transect from the southern Central Valley up Cordon Ridge in Tejon Ranch spans 17 km and includes 52 samples from 361–1766 m elevation (Fig. 1, S1, and S2). Fresh leaf samples were stored on ice until arriving at UCLA, where they were stored at −80°C.

Total genomic DNA was extracted from approximately 50 mg of frozen tissue using the Qiagen DNeasy Plant Mini Kit with an added “prewash” step to minimize secondary compounds in the final product (Gaddis, Zukin, Dieterich, Braker, & Sork, 2014). The prewash was as follows: after bead-based tissue grinding under liquid N2, 1 mL of prewash buffer was added, the mixture was shaken for 10–20 s at 30 Hz and then spun for 10 min at 10,000 rpm in a microcentrifuge, the supernatant was discarded, and finally the pellet was processed with the standard DNeasy protocol starting at the Buffer AP1/RNase A addition step. The prewash buffer was composed of 100 mM Tris-HCl (from pH 8.0 stock), 50 mM EDTA, 1 M NaCl, 0.01 g/mL polyvinylpyrrolidone (PVP) (k-30, SABC), and 4.5 μL/mL 2-mercaptoethanol (added immediately prior to use).

Genotyping by sequencing. Total genomic DNA was prepared for sequencing using an efficient restriction-enzyme-based approach commonly known as genotyping by sequencing (GBS) (Elshire et al., 2011). GBS is a cost-effective method that has proven fruitful in population and landscape genomic studies with related goals (Gugger, Liang, Sork, Hodgskiss, & Wright, 2018; Martins et al., 2018; Parchman, Jahner, Uckele, Galland, & Eckert, 2018). Briefly, DNA was digested with a restriction enzyme, common and unique barcoded adapters with overhangs complementary to the cut site were ligated to each sample, samples were pooled in equimolar ratios, and the pooled library was PCR-amplified and sent for Illumina sequencing. We largely followed the original protocol, including using the same adapter sequences, adapter concentration (0.036 ng/µL of each adapter), and restriction enzyme (ApeKI). However, we pooled 48 samples per library prep/sequencing lane rather than 96; adapters were added during the ligation step rather than prior to restriction digestion; AMPure XP bead-based size selection/purification steps were added after the ligation step and repeated after the PCR step to ensure a consistent distribution of fragment sizes between 200 and 500 bp (including adapters) among all preps; and we reduced the number PCR cycles to 16 from 18. Final libraries were checked for the proper size distribution on an Agilent BioAnalyzer with the High Sensitivity DNA assay and quantified using a Qubit fluorometer. Libraries were sent to the UCLA Broad Stem Cell Research Center for single-end, 100-bp sequencing on an Illumina HiSeq2000 v3.

SNP calling. Illumina reads were demultiplexed and quality filtered using Stacks 1.28 to 1.41 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011) process_radtags, which removed adapter sequence with up to two mismatches (adapter_mm), recovered barcodes with up to one mismatch to the expected barcodes (r), removed any read with an uncalled base (c), discarded low quality reads as defined by default settings (q), and trimmed all reads to 92 bp (t). Using BWA 0.7.12 (Li & Durbin, 2010), the filtered reads were aligned to the Quercus lobata reference genome version 0.5 (NCBI accession # ID 308314, also available at http//valleyoak.ucla.edu; (Sork, Fitz-Gibbon, et al., 2016)). We used GATK 3.7 (DePristo et al., 2011) to identify SNPs in each aligned sample using a minimum Phred-scaled confidence threshold of 30. We then used “VariantFiltration” and “SelectVariants” tools in GATK to exclude low quality variants, applying the following filters: QD < 10.0 (quality by depth), DP < 4 (genotype read depth), and ExcessHet > 100. We used VCFtools 0.1.15 (Danecek et al., 2011) to filter the SNPs to include only diallelic sites, present in at least 90% of individuals, minor allele frequency (MAF) ≥ 0.10, and mean depth of coverage across all samples < 80× to filter organellar DNA and sites aligning to genomic regions that may have been overcollapsed in the reference genome. Finally, SNPs were pruned in plink 1.90b4.9 (Chang et al., 2015) to remove those exhibiting high linkage disequilibrium defined by using a 5 kb window, sliding 5 SNPs, and removing a SNP from all pairs with r2 > 0.5. The above filtering was performed four times: once for all samples together to generate a range-wide data set and then once for each set of samples belonging to a transect. Our filtering strategy maximizes the number of high-quality SNPs that can be meaningfully analyzed in each data set.

Filtered SNP tables - Rangewide, Hamilton, Tejon, and Madera transects

Data files

Abstract

Methods