Forecasting climate change response in an alpine specialist songbird reveals the importance of considering novel climate
Data files
Sep 25, 2022 version files 13.65 GB
-
bcrf.vcf.gz
-
README.txt
Jan 26, 2024 version files 13.87 GB
-
bcrf-climate.zip
-
README.md
Abstract
Species persistence in the face of climate change depends on both ecological and evolutionary factors. Here, we integrate ecological and whole-genome sequencing data to describe how populations of an alpine specialist, the Brown-capped Rosy-Finch (Leucosticte australis) may be impacted by climate change. We sampled 116 Brown-capped Rosy-Finches from 11 sampling locations across the breeding range. Using 429,442 genetic markers from whole-genome sequencing, we described population genetic structure and identified a subset of 436 genomic variants associated with environmental data. We modelled future climate change impacts on habitat suitability using ecological niche models (ENMs) and impacts on putative local adaptation using gradient forest models (a genetic-environment association analysis; GEA). We used the metric of niche margin index (NMI) to determine regions of forecasting uncertainty due to climate shifts to novel conditions. Population genetic structure was characterized by weak genetic differentiation, indicating potential ongoing gene flow among populations. Precipitation as snow had high importance for both habitat suitability and changes in genetic variation across the landscape. Comparing ENM and gradient forest models with future climate predicted suitable habitat contracting at high elevations and population allele frequencies across the breeding range needing to shift to keep pace with climate change. NMI revealed large portions of the breeding range shifting to novel climate conditions. Our study demonstrates that forecasting climate vulnerability from ecological and evolutionary factors reveals insights into population-level vulnerability to climate change that are obfuscated when either approach is considered independently. For the Brown-capped Rosy-Finch, our results suggest that persistence may depend on rapid adaptation to novel climate conditions in a contracted breeding range. Importantly, we demonstrate the need to characterize novel climate conditions that influence uncertainty in forecasting methods.
README: Forecasting climate change response in an alpine specialist songbird reveals the importance of considering novel climate
https://doi.org/10.5061/dryad.stqjq2c4r
These data are associated with the paper of the same name by DeSaix et al. (2022) published in Diversity and Distributions (https://doi.org/10.1111/ddi.13628).
Description of the data and file structure
The data provided are described here in the order of the directory they are in the provided compressed file "bcrf-climate.gz".
admixture
Directory of data related to analyses from the ADMIXTURE software
bcrf.admix.df.q2.csv = Metadata file for the 104 filtered individuals (see bcrf.nokin.ids.txt) with the ADMIXTURE results for K=2, where the file has 6 attributes (ID = unique sample ID, Lat = Latitude of sample collection site, Long = Longitude of sample collection site, Group_ID = four letter abbreviation of sample collection site, K1 = admixture proportions for cluster one, K2 = admixture proportions for cluster 2).
bcrf.admix.df.q3.csv = Metadata file for the 104 filtered individuals (see bcrf.nokin.ids.txt) with the ADMIXTURE results for K=3, where the file has 6 attributes (ID = unique sample ID, Lat = Latitude of sample collection site, Long = Longitude of sample collection site, Group_ID = four letter abbreviation of sample collection site, K1 = admixture proportions for cluster one, K2 = admixture proportions for cluster 2, K3 = admixture proportions for cluster 3).
breeding_range_shapefile_sdm
Directory of all the data associated with the shapefile for the Brown-capped Rosy-Finch species distribution model (breeding range, current time period) that is produced for and described in this research. Shapefiles are a unique type of geospatial data that can be opened with any standard GIS software (e.g., ArcGIS, QGIS) or programming languages that handle spatial data (e.g., R). The five files in this directory with the prefix "current" function together as the shapefile.
GEA
In the GEA/ directory there are additional directories:
candidate_snps/ directory has the genetic data of the subset of SNPs that were the output from the genetic-environment-association (GEA) analyses.
* bcrf.nokin.1991.cand.intersect.436.vcf.gz = the VCF file format of the 436 SNPs that were the result of the intersection of all SNPs identified by LFMM and RDA methods (described in the manuscript) for environmental data from the 1991-2020 time period.
* bcrf.nokin.1991.cand.union.6045.vcf.gz = the VCF file format of the 6,045 SNPs that were the result of the union of all SNPs identified by LFMM and RDA methods.
The ./environment directory has the environmental predictor data for the GEA analyses, these data are openly available from the Adaptwest project https://adaptwest.databasin.org/
predictor_environment.1991.mems.csv = A .csv file of the environmental predictor data for each individual sample (based on its latitude and longitude of sample collection) for the 1991-2020 time period. The attributes/column names of the file are: ID = sample ID, Near_Town = sample collection site name, MEM1 to MEM5 are the moran eigenvector map values described in the manuscript, MWMT = Mean temperature of the warmest month (Celsius), PAS = Precipitation as snow (Millimeter), Elev = Elevation (Meter), and SHM = summer heat moisture index (Mean temperature of the warmest month [Celsius] / (Mean summer precipitation [Millimeter] / 1000)).
genetic_data
In the genetic_data/ directory, there are additional directories:
filtered/ directory has the subset of SNPs that passed all bioinformatics quality controls described in the methods of this paper.
* bcrf.nokin.filtered.snps.fst01.qual30.missing.05.maf01.5_4depth12.ld02.vcf.gz = The tar gz compressed genetic data in VCF file format, these are to manipulated with specific software such as bcftools. These data are for the 104 filtered individual samples.
* bcrf.nokin.filtered.snps.fst01.qual30.missing.05.maf01.5_4depth12.ld02.vcf.stats.txt = A text file of metadata describing the genetic markers in the aforementioned *.gz file. These stats are produced by the bcftools function "stats".
raw/ directory has all of the genetic data prior to stringent filtering
* bcrf.vcf.gz = The full genetic data for all 116 individual samples in VCF file format.
* bcrf.vcf.gz.stats.txt = A text file of metadata describing the genetic markers, which is produced by the bcftools function "stats".
metadata
Directory of metadata
BCRF-meta-tidy.csv = Metadata file for all 116 samples with 5 attributes (ID = unique sample ID, Near_town = Name of sample collection site, Lat = Latitude of sample collection site, Long = Longitude of sample collection site, Group_ID = four letter abbreviation of sample collection site)
bcrf.nokin.ids.txt = Single column text file of the 104 sample IDs of the individuals that were not related and passed QC to be used in all population genetics analyses in the manuscript.
Code/Software
Scripts used to analyze these data are provided here: https://github.com/mgdesaix/bcrf-climate/
Methods
We sequenced feather and blood samples from 116 individuals spanning 11 sites across the Brown-capped Rosy-Finch breeding distribution (Table 1). Samples were collected during the breeding season of 2017 and 2018. Individuals from the Lost Man Lake and Independence Lake sites were combined as a single sampling unit for subsequent analyses based on their proximity (< 1 km) and the low sample sizes (5 and 1 individuals, respectively). Engineer Mountain and Horseshoe Basin sites were also in close proximity (< 5 km), but we retained them as separate sampling units due to the larger number of individuals per site (8 and 18 individuals, respectively).
We extracted DNA from blood samples using the standard protocol for Qiagen DNEasy Blood and Tissue Kits and we modified the protocol to maximize DNA yield from feathers. Whole genome sequencing libraries were prepared following modifications of Illumina’s Nextra Library Preparation protocol. Pooled libraries were sequenced on HiSeq 4000 lanes at Novogene Corporation Inc. All sequence data were quality filtered (GATK: McKenna et al., 2010; BCFtools: Li, 2011; Samtools: Li et al., 2009) and aligned (Burrows-Wheeler Aligner software; Li & Durbin, 2009) to a high-quality Brown-capped Rosy-Finch reference genome that was created by Dovetail Genomics through 10x de novo assembly and HiRise Scaffolding. The reference genome was created from liver samples of the Brown-capped Rosy-Finch (Denver Museum of Nature and Science samples DMNS52416 and DMNS52417). The reference genome was annotated with the most recent zebra finch annotations available (NCBI GCA_008822105.2) using the program Liftoff (Shumate & Salzberg, 2021). For the input into all subsequent analyses, we extracted high-quality single-nucleotide polymorphisms (SNPs; Supporting information).
We trimmed the sequence data to remove potential PCR artifacts using the program TrimGalore version 0.6.5 (https://github.com/FelixKrueger/TrimGalore), a wrapper for Cutadapt (Martin, 2011). PCR duplicates need to be removed to ensure high-quality sequence data in downstream processes, such as creating scaffolds in whole-genome sequencing. We used the Burrows-Wheeler Aligner software version 0.7.17 (Li & Durbin, 2009) to map reads to the reference genome. After mapping, the resulting SAM files were sorted, converted to BAM files, and then indexed using Samtools version 1.9 (Li et al., 2009). We marked read duplicates with MarkDuplicatesSpark from GATK version 4.1.4.0 (McKenna et al., 2010) and overlapping reads with the clipOverlap function from bamUtil (https://genome.sph.umich.edu/wiki/BamUtil:_clipOverlap). Depth of sequencing coverage and reference genome mapping statistics were calculated in samtools (Li, 2011). Mean and range of depth of coverage of individuals for each sampling site were summarized to determine if there was sequencing bias among sites.
To account for read errors from sequencing machines we performed base quality score recalibration. Base quality score recalibration involved identifying high confidence variants by taking the intersection of single-nucleotide polymorphism (SNP) data sets identified by two variant calling algorithms (GATK; McKenna et al., 2010; Samtools; Li & Durbin, 2009). We used the resulting set of SNPs as the reference data for recalibrating all BAM files with the BaseRecalibrator and ApplyBQSR functions in GATK (McKenna et al., 2010). Finally, the base score recalibrated BAM files were used to create VCF files with the HaplotypeCaller function of GATK (McKenna et al., 2010; Poplin et al., 2017). In order to facilitate faster computation time, BAM files were processed in approximately 1.5 MB intervals and the resulting VCF files were combined with the GatherVCFs function in GATK. This provided a single VCF file of a master set of genomic sites to be filtered appropriately for the subsequent analyses. All indexing of VCF files was performed with the index function in BCFtools version 1.9 (Li, 2011).
For the input into these analyses, we extracted high-quality single nucleotide polymorphisms (SNPs) from the master VCF file by filtering with the view function in BCFtools (Li, 2011). Specifically, we filtered for biallelic sites (-m 2 -M 2), minor allele frequency of at least 0.1 (--min-af 0.1, --max-af 0.9), sequencing quality score of at least 30 (‘QUAL > 30’) and were missing from less than 10% of the individuals sampled ('F_MISSING < 0.1'). To remove paralogs, we filtered variants by minimum and maximum depth thresholds (-i 'AVG(FORMAT/DP)>5.4 & AVG(FORMAT/DP)<12'). The minimum threshold was set as the mean depth of the sequencing data multiplied by the minimum number of individuals without missing data detailed in the previous step, and the maximum being twice the mean depth of the sequencing data multiplied by the total number of individuals. To remove linked SNPs, variants were pruned by linkage disequilibrium (r > 0.2) using Plink2.0 (Purcell et al., 2007). We filtered the data set to only include non-sex-linked autosomal variation by removing variants with pairwise FST > 0.01 for male and female birds.
Usage notes
See README file for details.