Skip to main content

Genomic data of Columbia spotted frog (Rana luteiventris)

Cite this dataset

Cayuela, Hugo; Forester, Brenna; Funk, Chris (2021). Genomic data of Columbia spotted frog (Rana luteiventris) [Dataset]. Dryad.


In this study, we examined the potential role of thermal adaptation in clinal shifts of life history traits (i.e., lifespan, senescence rate, and recruitment) in the Columbia spotted frog (Rana luteiventris) along a broad temperature gradient in the western USA.

We took advantage of extensive capture-recapture datasets of 20,033 marked individuals from eight populations surveyed annually for 14 – 18 years to examine how mean annual temperature and precipitation influenced demographic parameters (i.e., adult survival, lifespan, senescence rate, recruitment, and population growth). After showing that temperature was the main climatic predictor influencing demography, we used RAD-seq data (50,829 SNPs and 6,599 putative CNVs) generated for 352 individuals from 31 breeding sites to identify genomic signatures of thermal adaptation.

Our results showed that temperature was negatively associated with annual adult survival and reproductive lifespan and positively associated with senescence rate. By contrast, recruitment increased with temperature, promoting the long-term viability of most populations. These temperature-dependent demographic changes were associated with strong genomic signatures of thermal adaptation. We identified 148 SNP candidates associated with temperature including three SNPs located within protein-coding genes regulating resistance to cold and hypoxia, immunity, and reproduction in ranids. We also identified 39 CNV candidates (including within 38 transposable elements) for which normalized read depth was associated with temperature.

Our study indicates that both SNPs and structural variants are associated with temperature and could eventually be found to play a functional role in clinal shifts in senescence rate and life history strategies in R. luteiventris. These results highlight the potential role of different sources of molecular variation in the response of ectotherms to environmental temperature variation in the context of global warming.



Sampling area and DNA collection

Toe clips (from metamorph and adult frogs) and tail clips (5-10 mm, from tadpoles) were collected between 2010-2019. Individuals were captured at 31 breeding ponds across the Nevada range (Fig.1). The ponds were located within 10 hydrographic catchments (Fig.1). All individuals were released at their capture location after processing. The average number of individuals sampled in the breeding ponds was 11 (range: 4-14) (see Supplementary material S1, Table S2 for sampling, geographical, and temperature information).

DNA extraction, library preparation and RAD-sequencing

We extracted DNA from tissue samples using Qiagen DNeasy Blood and Tissue Kits following manufacturer protocols, with the addition of 4 µl of RNase after tissue digestion. We quantified total DNA concentrations using Qubit dsDNA BR Assays and verified DNA quality in a subset of individuals using electrophoresis on agarose gels.
We tested five restriction enzyme digestions on a subset of individuals to find the enzyme combination that provided the appropriate distribution of fragment sizes (250-450 base pairs). We then digested ~1μg of total DNA per individual using NspI and Eco-RI. We followed Peterson et al. (2012), using a Pippin Prep size selection of 346-456 base pairs and Dynabeads cleanup. We used KAPA HiFi Hotstart Ready Mix for PCR amplification, and degenerate barcodes on the P2 adapter to facilitate removal of PCR duplicates (Schweyen et al. 2014). We sequenced one library with the University of Oregon Genomics Core Facility (HiSeq-4000, 100 bp paired-end reads), and the remaining nine libraries with Novogene Corp. (HiSeq-4000 150 bp paired-end reads). The first two libraries contained 24 and 36 individuals, respectively. All remaining libraries contained 40 individuals. Sites were split across libraries to avoid confounding site and library effects.

We used FastQC v. 0.11.8 (Andrews 2019) to assess data quality and identify sequencing anomalies in each library. We removed PCR duplicates from the raw reads using clone_filter from Stacks v. 2.41 (Catchen et al. 2011, 2013; Rochette et al. 2019). Using Trim Galore! v. 0.6.4 (Krueger 2019) and Cutadapt v. 2.5 (Martin 2011), we removed the 10 bp degenerative barcodes from the P2 reads and applied quality screening using a Phred score cutoff of 20 and auto-detect adapter screening with stringency set to 6. We then used Stacks process_radtags to demultiplex the raw data, trim reads to 90 bp, and run additional quality control (remove reads with an uncalled base; discard reads with low quality scores (default settings); rescue barcodes and cut sites with at most one mismatch).

We used within and cross-library replicates to optimize three de novo Stacks parameters: minimum stack depth (-m), distance between stacks (-M), and the distance between catalog loci (-n). We used 19 individuals, each with a replicate sample, from 16 sites (total samples = 38), covering the geographic range of sampling effort. Thirteen combinations of -m (3-5), -M (2-6), and -n (1-6) were tested across the entire pipeline, setting the upper bound on the error rate for the SNP model (ustacks --bound-high) to 0.05. To determine the optimal combination of parameters, we evaluated mean coverage, the number of assembled loci, the number of polymorphic loci shared across 80% of individuals, and the number of SNPs (Paris et al. 2017). We also assessed population/duplicate clustering in multivariate space using multidimensional scaling plots (Gugger et al. 2018) calculated in plink v. 1.90 (Chang et al. 2015).

After optimizing Stacks parameters, we ran the Stacks pipeline on all unique samples and filtered using the populations module: -p 1 (minimum number of populations a locus must be present in to process), -R 0.3 (minimum percentage of individuals across populations required to process a locus), --min_mac 2 (minimum minor allele count required to process a SNP). We then ran the ‘populations’ module from Stacks to produce a vcf file defining the minimum number of populations a locus must be present in to process (p=1), the minimum percentage of individuals across populations required to process a locus (R=0.3), and the minimum minor allele count required to process a SNP (min_mac 2). The outputed vcf file was further filtered using python and bash scripts from the stacks workflow RADseq pipeline analysis available online ( The raw vcf was filtered keeping only SNPs that showed a minimum coverage depth of 10x, were called in at least 70% of the samples per sampling location (according a maximum of four locations that can fail this latter filter) and a minimum number of samples having the rare allele of four across all sampling locations.  Ultimately, samples showing more than 60% of missing data and SNPs showing more than 40% of missing data were discarded.


Usage notes

If you are willing to get an unfiltered vcf file, please contact Chris Funk (