Data from: Genetic, phenotypic, and environmental drivers of local adaptation and climate-change induced maladaptation in yellow warblers

Rodriguez, Marina 1 ; Bossu, Christen 1 ; Bay, Rachael2; Anderson, Eric 1 ; Ruegg, Kristen 1

Published Oct 24, 2025 on Dryad. https://doi.org/10.5061/dryad.hmgqnk9tp

Abstract

Understanding processes driving local adaptation in wild species is a key goal in evolutionary biology, but linking genotype to phenotype to environmental drivers of natural selection remains challenging. This dataset contains the necessary data to replicate the analyses in Rodriguez et al, which explores the connections between genotypes, phenotypes, and environment in yellow warblers across their breeding range. First, we conduct genome-wide association studies (GWAS) to identify loci related to bill shape and individual quality. We then conduct a gene-environment association (GEA) analysis on the resulting loci and find precipitation is underlying putative selection on bill shape. Finally, we test whether contemporary individuals whose bill shape deviates from historical relationships with precipitation exhibit increased stress—measured by telomere length—resulting from maladaptation. We collected samples from 121 yellow warblers from two reference populations in Michigan and Pennsylvania. At each site, birds were captured using mist-netting, bill depth measurements were taken, and blood samples were collected via brachial venipuncture and preserved in Queens lysis buffer. Further, we collected an additional 171 genetic samples from 22 sites across the yellow warbler breeding range to validate associations between allele frequencies and environmental variables in key loci. From the 171 samples, 63 samples with bill depth measurements from 10 sites across the breeding range were also used to validate the associations between bill depth and environmental variables. In addition, 169 historical yellow warbler samples were collected from museum specimens on the breeding range to run a population structure analysis to ask if local populations have shifted their geographic ranges over the last century.

https://doi.org/10.5061/dryad.hmgqnk9tp

Description of the data and file structure

The following datasets include the required input files needed to conduct genome-wide association studies (GWAS) to identify loci related to bill shape and telomere length in yellow warbler reference populations. This includes the .bed files (binary biallelic genotype tables), .bim files (extended variant information files), and the .fam files (sample information file) for both telomere length and bill depth, as well as the kinship matrix GWAS_relate.cXX.txt. The metadata file for the GWAS data is located in GWAS.samples_121inds_meta.csv, and includes the sample IDs, the coordinates, and the telomere and bill depth measurement for each individual.

This dataset also includes the data needed to conduct a gene-environment association (GEA) analysis on loci from the preceding GWAS. This includes the allele frequency data for each of the overlapping non-zero single nucleotide polymorphisms (SNPs) from the GWAS on bill depth and telomere length (in GEA_overlappingSNPs.AF_171inds.txt). This file includes allele frequency for each SNP (columns) for each sample site (rows). The metadata for this analysis is in GEA.samples_22pops_env.meta.csv and includes the population location, coordinates, and Bioclim data for each location extracted from Worldclim.

Finally, this dataset also includes the data needed to test the relationship between telomere length and phenotype-climate mismatch. This file (yewa.precip.TL_residuals.csv) includes the population location (LOC1), contemporary bill depth measurement (BDEPTH), average breeding precipitation measure extracted from WorldClim (clim), scaled telomere length (TL.scale), coordinates (y,x), and the distance between contemporary and historic associations between precipitation and bill depth. This was calculated by first finding the line of best fit for historic data from Wiedenfeld (1991) between precipitation and bill depth. We then found the distance between each contemporary association between precipitation and bill depth and the historic line of best fit.

Files and variables

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.fam

Description: This file is a sample information file accompanying the associated .bed binary genotype table and .bim extended variant information file. This file has no header line, and one line per sample with the following six fields:

Family ID ('FID')
Population ID
Within-family ID of father
Within-family ID of mother
Sex code
Scaled telomere length

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.bim

Description: This file is an extended variant information file accompanying the associated .bed binary genotype table and .fam sample information file. This file has no header line, and contains one line per variant with the following six fields:

Chromosome code or name
Variant identifier
Variant position
Base-pair coordinate
Allele 1
Allele 2

File: yewa.filtered.imputed.depth.PAMI.121inds.fam

Family ID ('FID')
Population ID
Within-family ID of father
Within-family ID of mother
Sex code
Bill depth

File: GWAS_relate.cXX.txt

Description: Estimated relatedness matrix calculated from genotypes in program GEMMA. Contains an n × n matrix of estimated relatedness between all samples.

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.bed

Description: Primary representation of genotype calls at biallelic variants accompanying the associated .bim extended variant information file and .fam sample information file. The file is a sequence of V blocks of N/4 (rounded up) bytes each, where V is the number of variants and N is the number of samples. The first block corresponds to the first marker in the .bim file, etc.

The low-order two bits of a block's first byte store the first sample's genotype code. ("First sample" here means the first sample listed in the accompanying .fam file.) The next two bits store the second sample's genotype code, and so on for the 3rd and 4th samples. The second byte stores genotype codes for the 5th-8th samples, the third byte stores codes for the 9th-12th, etc.

File: yewa.precip.TL_residuals.csv

Description:

Variables

LOC1: Population name
BDEPTH: bill depth measured in mm
clim: measure of average breeding precipitation in mm
TL.scale: Telomere length measured using program TelSeq and scaled by age and mass
resids: We used the ‘lm’ function in R version 3.5.3 (https://www.R-project.org) to fit linear models to test the association between bill depth and the environment for both our historic and contemporary samples. We then calculated the residuals from the contemporary association to the historical line of best fit to get a measure of the phenotype-climate mismatch.
y: Lattitude
x: Longitude

File: GEA.samples_22pops_env.meta.csv

Description: Metadata needed to run GradientForest on yellow warbler population allele frequencies. Includes Bioclim variables extracted from Worldcim.

Variables

Location: population ID
Lat: Lattitude
Long: Longitude
bio_19: Precipitation of coldest quarter (mm)
bio_18: Precipitation of warmest quarter (mm)
bio_17: Precipitation of driest quarter (mm)
bio_16: Precipitation of wettest quarter (mm)
bio_15: Precipitation seasonality (coefficient of variation)
bio_14: Precipitation of driest Month (mm)
bio_13: Precipitation of wettest month (mm)
bio_12: Annual precipitation (mm)
bio_11: Mean Temperature of Coldest Quarter (°C)
bio_10: Mean Temperature of warmest Quarter (°C)
bio_9: Mean Temperature of Driest Quarter (°C)
bio_8: Mean Temperature of wettest Quarter (°C)
bio_7: Temperature annual range (°C)
bio_6: Min Temperature of Coldest Month (°C)
bio_5: Min Temperature of warmest Month (°C)
bio_4: Temperature seasonality (°C)
bio_3: Isothermality (%)
bio_2: Mean Diurnal Range (°C)
bio_1: Annual mean temperature (°C)
tree: Tree cover
srtm: Elevation (m)
qscat: surface moisture characteristics
ndvistd: vegetation variation
ndvimax: maximum vegetation cover
hii: human impact

File: yewa.breed.fix.shp

Description: Shapefile of yellow warbler breeding range

File: yewa.breed.fix.dbf

Description: Attribute data to accompany the shapefile of yellow warbler breeding range

File: yewa.breed.fix.shx

Description: Shape index file to accompany the shapefile of yellow warbler breeding range

File: yewa.breed.fix.prj

Description: Coordinate reference system file to accompany the shapefile of yellow warbler breeding range

File: yewa.filtered.imputed.depth.PAMI.121inds.bim

Chromosome code or name
Variant identifier
Variant position
Base-pair coordinate
Allele 1
Allele 2

File: yewa.filtered.imputed.depth.PAMI.121inds.bed

File: GEA.GF_overlappingSNPs.AF_171inds.txt

Description: This file contains the allele frequencies for all of the overlapping non-zero SNPs found in the bill depth and telomere length GWAS. The SNP names are in the first row, with population allele frequencies in each row.

File: museum_meta_clean.csv

Description: This file contains the metadata for the historical specimens collected from museums across the breeding range of the yellow warbler.

Variables

FieldID
Museum: Three letter code for the museums from which we took historical samples: Buffalo Society of Natural Sciences, California Academy of Sciences, Carnegie Museum of Natural History, Charles R. Conner Museum, Delaware Museum of Natural History, Denver Museum of Nature & Science, Field Museum of Natural History, University of Kansas Biodiversity Institute, Natural History Museum of Los Angeles County, Museum of Comparative Zoology - Harvard University, Museum of Vertebrate Zoology - UC Berkeley, James R. Slater Museum of Natural History, Royal Ontario Museum, Museum of Wildlife and Fish Biology - UC Davis, University of Michigan Museum of Zoology, Yale Peabody Museum.
Catalog no: Identification number for each individual specimen at their host museum
GUID: museum and catalog number for each sample
Year
Month
Day
Sex
Date
Lat1
Lon1

File: PopID.ped

Description: This file contains the genotype data for historical yellow warbler samples used to ask whether local populations have shifted their geographic ranges over the last century. Samples were skin or toe pads loaned from museums. The first column is the family ID, and the second column is the individual ID. The next 4 columns are for phenotypes, but are all zeros as we did not include phenotypes in this analysis. After the first six columns are the genotypes for each SNP in the order they appear in the companion PopID.map file. Each SNP is represented by two alleles, separated by a space.

File: PopID.map

Description: This file contains the location data for the SNPs used to find whether local populations have shifted their geographic ranges over the last century. This is a companion file to the PopID.ped file. The first column indicates the chromosome, the second column is the SNP ID, the third column is the genetic distance, and the last column indicates the base-pair position.

File: PopID.loc

Description: This file contains the location data for samples used in the test of whether local populations have shifted their geographic ranges over the last century. The first column indicates the sample ID, the second column is the town where the sample was collected, the third column is the decade in which the sample was collected, and the last two columns are the sample collection coordinates.

Code/software

The allele frequency file in this repository require Linux to generate. Users are provided with the genome-wide association scripts to do so if they wish; however, users are also provided with the file outputs so that obtaining access to a machine with a Linux operating system is not a requirement to replicate our analyses. The remaining scripts require R to run.

Access information

Data was derived from the following sources:

Worldclim database (Harris et al., 2020, Fick and Hijmans, 2017).
National Landcover Database: https://www.usgs.gov/centers/eros/science/national-land-cover-database
BYU Center for Remote Sensing: https://www.scp.byu.edu
Carroll, M. L., DiMiceli, C. M., Sohlberg, R. A., & Townshend, J. R. G. (2004). 250m MODIS normalized difference vegetation index. University of Maryland, College Park, Maryland.
Sexton, J. O., Song, X. P., Feng, M., Noojipady, P., Anand, A., Huang, C., ... & Townshend, J. R. (2013). Global, 30-m resolution continuous fields of tree cover: Landsat-based rescaling of MODIS vegetation continuous fields with lidar-based estimates of error. International Journal of Digital Earth, 6(5), 427-448.
Wiedenfeld, D. A. (1991). Geographical morphology of male Yellow Warblers. The Condor, 93(3), 712-723.

We collected samples from 121 yellow warblers from two reference populations in Michigan and Pennsylvania. At each site, birds were captured using mist-netting, bill depth measurements were taken, and blood samples were collected via brachial venipuncture and preserved in Queens lysis buffer. Further, we collected an additional 171 genetic samples from 22 sites across the yellow warbler breeding range to validate associations between allele frequencies and environmental variables in key loci. From the 171 samples, 63 samples with bill depth measurements from 10 sites across the breeding range were also used to validate the associations between bill depth and environmental variables.

Whole genome sequencing libraries were prepared following modifications of Illumina’s Nextera Library Preparation protocol with a target sequencing depth of 2X per individual. We used the program Trimmomatic 0.39 to trim the sequence data to remove Illumina adapter sequences and polyG tails using a sliding window approach (SLIDINGWINDOW:4:20). We then mapped reads to the yellow warbler reference genome (NCBI BioProject PRJNA777222) using BWA 0.7.17. After mapping, the resulting SAM files were sorted, converted to BAM files, and indexed using Samtools version 1.16. We used MarkDuplicates from Picard (http://broadinstitute.github.io/picard) to mark read duplicates and clipped overlapping reads with the clipOverlap function from bamUtil. To reduce sequencing depth variation, we used the downsample function from Picard (http://broadinstitute.github.io/picard) to downsample reads from BAM files with greater than 3X coverage, to 3X coverage. This resulted in an average read depth of 2.7X coverage.

To identify genetic markers from low-coverage WGS data, we used the program HaplotypeCaller in the Genome Analysis Toolkit (GATK version 4.1.6.0) applying a minimum base quality score of 33 and a minimum mapping quality score of 20 to reduce lane effects. To parallel the genotype calling process, we generated genomic databases in ~3 Mb intervals across the genome and combined and indexed the genotyped VCF files with BCFtools 1.16. To remove systematic errors, we applied a hard filter to the subsequent VCF file with the following parameters, "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", filtering the indels separately with "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0". We then used BCFtools to keep biallelic sites (-m 2 -M 2) missing in fewer than 20% of the sampled individuals ('F_MISSING < 0.20'), with minor allele frequency of at least 0.05 (--min-af 0.05, --max-af 0.95), and with a sequencing quality score of at least 30 (‘QUAL > 30’)⁴². This filtering resulted in 2,999,708 variants in 298 individuals with an average of 21% missing data.

We measured telomere length from bam files using Telseq v0.0.2. We modified parameters in the Telseq source code to adapt it to the yellow warbler genome, which includes changing the number of chromosomal ends, read length, and total GC content (bp). The parameters TELOMERE_ENDS, READ_LENGTH and GENOME_LENGTH_AT_TEL_GC were set equal to 62, 100, and 143831148, respectively. We calculated the latter by measuring the total length of 150 base pair windows in the yellow warbler genome with a GC content between 48% and 52%.

To compare the current and pre-climate-change associations between phenotype and environment, we used data from Wiedenfeld (1991) which includes morphometric measurements from 153 yellow warblers captured between 1873 and 1987. We used wing-chord as a proxy for body size to calculate body-size corrected bill depth in historic and current samples. Using locations of capture, we extracted historical monthly climate data from Worldclim for the breeding months of May, June, and July for each sample between the years of 1901 – 1950, which we then averaged. As bioclim variables are not available for historic time-periods, we used an average precipitation. We then used the ‘lm’ function in R version 3.5.3 (https://www.R-project.org) to fit linear models to test the association between bill depth and the environment for both our historic and contemporary samples. We then calculated the residuals from the contemporary association to the historical line of best fit. We used those residuals as a measure of change between the historic and contemporary relationship between bill depth and climate, where a larger residual means a bigger mismatch between bill depth and the environment, relative to what we assume is the pre-climate change optimal.

We used population structure analyses to ask whether local populations have shifted their geographic ranges over the last century. We assembled a collection 169 historic samples of yellow warblers sampled on their breeding range. Historic samples were skin or toe pads loaned from museums (Supplementary Table ##). All samples were extracted using the Qiagen DNeasy Blood and Tissue Kit and genotyped at a set of 96 SNPs, previously identified for geographic assignment (Bay et al. 2021), using a Fluidigm 96.96 IFC controller. After SNP genotyping, we discarded individuals with poor quality data (<50% of SNPs genotyped). Genotypes from historic samples were combined with previously genotyped contemporary yellow warblers (1990-present) sampled on their breeding range (Bay et al. 2021). This left us with a final set of 551 samples (129 historical and 422 contemporary)

We performed principal components analysis (PCA) on contemporary samples only to establish the relationship between genetic variation and geography. PCA was performed using the SNPRelate package in R v4.3.2. We then predicted loadings of historical samples using the snpgdsPCASampLoading function. Historical samples were plotted alongside contemporary samples to visualize whether relationships between genetic variation and geography changed over time. We used linear models to test for effects of latitude, longitude, and time (historical v. contemporary) on PC axes.

Data from: Genetic, phenotypic, and environmental drivers of local adaptation and climate-change induced maladaptation in yellow warblers

Data files

Abstract

README: Data from: Genetic, phenotypic, and environmental drivers of local adaptation and climate-change induced maladaptation in yellow warblers

Description of the data and file structure

Files and variables

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.fam

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.bim

File: yewa.filtered.imputed.depth.PAMI.121inds.fam

File: GWAS_relate.cXX.txt

File: yewa.filtered.imputed.TL.statesex.resids.PAMI.121inds.bed

File: yewa.precip.TL_residuals.csv

Variables

File: GEA.samples_22pops_env.meta.csv

Variables

File: yewa.breed.fix.shp

File: yewa.breed.fix.dbf

File: yewa.breed.fix.shx

File: yewa.breed.fix.prj

File: yewa.filtered.imputed.depth.PAMI.121inds.bim

File: yewa.filtered.imputed.depth.PAMI.121inds.bed

File: GEA.GF_overlappingSNPs.AF_171inds.txt

File: museum_meta_clean.csv

Variables

File: PopID.ped

File: PopID.map

File: PopID.loc

Code/software

Access information

Methods