Data from: Genomic signatures of climate adaptation in bank voles
Data files
Feb 19, 2024 version files 28.07 MB
Abstract
Evidence for divergent selection and adaptive variation across the landscape can provide insight into a species’ ability to adapt to different environments. However, despite recent advances in genomics, it remains difficult to detect footprints of climate mediated selection in natural populations. Here we analysed ddRAD sequencing data (21,892 SNPs) in conjunction with geographic climate variation to search for signatures of adaptive differentiation in twelve populations of the bank vole (Clethrionomys glareolus) distributed across Europe. To identify the loci subject to selection associated with climate variation, we applied multiple genotype-environment association (GEA) methods, two univariate and one multivariate, and controlled for the effect of population structure. In total, we identified 213 candidate loci for adaptation, 74 of which were located within genes. In particular, we identified signatures of selection in candidate genes with functions related to lipid metabolism and the immune system. Using the results of redundancy analysis (RDA), we demonstrated that population history and climate have joint effects on the genetic variation in the pan-European metapopulation. Furthermore, by examining only candidate loci, we found that annual mean temperature is an important factor shaping adaptive genetic variation in the bank vole. By combining landscape genomic approaches, our study sheds light on genome-wide adaptive differentiation and the spatial distribution of variants underlying adaptive variation influenced by local climate in bank voles.
README: Data from: Genomic signatures of climate adaptation in bank voles
This README file was generated on 2024-02-18 by Remco Folkertsma.
https://doi.org/10.5061/dryad.1c59zw42p
We utilized a ddRAD sequencing approach to sequence the genome of 276 bank voles (Clethrionomys glareolus) from 12 populations from across Europe. This dataset includes the input files containing genomic data and environmental data, as well as R-scripts and sample meta-data. Genomic and environmental data was used to perform genotype-environmental analysis using LFMM, Bayenv2 and redundancy analysis in R (RDA-script included).
GENERAL INFORMATION
1. Title of Dataset: Genomic signatures of climate adaptation in bank voles
2. Author Information
A. Principal Investigator Contact Information
Name: Remco Folkertsma
Institution: Potsdam University
Email: remcofolkertsma@gmail.com
3. Date of data collection (single date, range, approximate date): 2012-2015
4. Geographic location of data collection: Europe (for details see SamplesMetaData.xlsx)
SHARING/ACCESS INFORMATION
1. Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
2. Links to publications that cite or use the data:
Folkertsma, R, et al. (2024). Genomic signatures of climate adaptation in bank voles. Ecology and Evolution.
3. Links to other publicly accessible locations of the data: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1035302
4. Links/relationships to ancillary data sets: None
5. Was data derived from another source? No
A. If yes, list source(s): NA
6. Recommended citation for this dataset:
Folkertsma, R, et al. (2024). Data from: Genomic signatures of climate adaptation in bank voles. Dryad Digital Repository. https://doi.org/10.5061/dryad.1c59zw42p
DATA & FILE OVERVIEW
1. File List:
A) SamplesMetaData.xlsx
B) Pops12.ind12.vcf
C) Count_Genotypes.py
D) PolygenicScores.txt
E) PolygenicScores.R
F) geo.matrix
G) gen.matrix
H) BioclimVals.txt
I) MantelTests.R
J) BioClim_Pops.csv
K) Pops12.ind12.PC1.env
L) Pops12.ind12.PC2.env
M) Pops12.ind12.lfmm_K10L31-imputed.lfmm
N) Pops12.ind12.Bayenv.SNPSFILE
O) Pops12.ind12.Bayenv.MATRIXFILE
P) Pops12.ind12.Bayenv.ENVIRONFILE
Q) PopulationValues.csv
R) RDA_Outlier_Script.R
S) ./RDA/
2. Relationship between files, if important: Relationships are described in the relevant sections for outlier analysis
3. Additional related data collected that was not included in the current data package: None
4. Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated: NA
i. Why was the file updated? NA
ii. When was the file updated? NA
Description of the data and file structure
.\SamplesMetaData.xlsx
Excel file containing information about samples used in this study, including: biosample_accession numbers and sample location information.
1. Number of variables: 8
2. Number of cases/rows: 276
3. Variable List:
* biosample_accession: biosample assesion under which data was stored on https://www.ncbi.nlm.nih.gov/\
* library_ID: library identifier\
* sample_name: unique sample name\
* collection_date: date of sample collection\
* Species: sample species (here all Clethrionomys glareolus (Bank vole))\
* Latitude: latitude of sample collected\
* Longitude: longitude location of sample collected\
* Population: population identifier (NE3.fi: Pallasjärvi; NE2.fi: Mäntyharju; NE1.fi: Vammala; N.se: Gimo; CE.pl: Urwitalt; SE.ro: Sovata; C1.de: Potsdam; C2.cz: Krušné hory; C3.cz: Litoměřice; S.it: Radicondoli; SW1.fr: La Venotière; SW2.fr: Toulouse)
4. Missing data codes: None
5. Specialized formats or other abbreviations used: None
.\Pops12.ind12.vcf
Variant Call Format file with SNP info for all 21,892 sites and all 276 individuals.
.\BioClim_Pops.csv
csv file containing bioclim variable values for each population
.\Polygenic_scores.zip
Folder containing data related to polygenic score calculation
.\Polygenic scores\Count_Genotypes.py
Python script used to calculate polygenic scores from individual genotype files
.\Polygenic scores\PolygenicScores.txt
1. Description: Individual polygenic scores for different subsets of outlier loci.
2. Number of variables: 47
3. Number of cases/rows: 276
4. Variables list:
* Individual: individual for which polygenic scores are calculated
* Population: individual's population
* 5 variables with Population specific climate variables derived from worldclim.org:
* AnMTemp: Annual Mean Temperature\
* TempSeas: Temperature Seasonality (standard deviation *100)\
* MDR: Mean Diurnal Range (Mean of monthly (max temp - min temp))\
* AnPrec: Annual Precipitation\
* PrecSeas: Precipitation Seasonality (Coefficient of Variation)
* 40 variables with outlier subset specific polygenic scores for specific climate variables. Subsets include:
* Candidate Loci\
* All outlier loci\
* Outliers associated with climate without controling for population structure\
* Outliers detected by Redundancy Analysis\
* Outliers detected by LFMM (PC1 and PC2)\
* Outliers detected by Bayenv (PC1 and PC2)
.\Polygenic scores\PolygenicScores.R
R script with code to calculate correlations between polygenic scores and subsets of outlier loci
.\MantelTests.zip
Folder containing data related to calculating IBD and IBE using (partial) Mantel tests
.\MantelTests\geo.matrix
Matrix file containing between population geographic distances. With rows and columns containing standardized pairwise geographic distances between populations. Rows and columns are ordered by populations as: NE3.fi; NE2.fi; NE1.fi; N.se; CE.pl; SE.ro; C1.de; C2.cz; C3.cz; S.it; SW1.fr; SW2.fr.
.\MantelTests\gen.matrix
Matrix file containing between population genetic distances. With rows and columns containing standardized pairwise genetic distances between populations. Rows and columns are ordered by populations as: NE3.fi; NE2.fi; NE1.fi; N.se; CE.pl; SE.ro; C1.de; C2.cz; C3.cz; S.it; SW1.fr; SW2.fr.
.\MantelTests\BioclimVals.txt
Matrix file containing between population distances for specific bio climatic values derived from worldclim.org. With rows and columns containing standardized distances between populations. Rows and columns are ordered by populations as: NE3.fi; NE2.fi; NE1.fi; N.se; CE.pl; SE.ro; C1.de; C2.cz; C3.cz; S.it; SW1.fr; SW2.fr.
Variable list:
* Population: Population code\
* Location: Population location name\
* Latitude: Population specific latitude\
* Longitude: Population specific longitude\
* Ave_pairwise_dist: Average distance to other populations \
* 19 worldclim variables (complete description: https://www.worldclim.org/data/bioclim.html
* wc1.4_alt: Elevation above sea level of population location (m)
.\MantelTests\MantelTests.R
R script to perform Mantel and partial Mantel tests
.\LFMM.zip
Folder containing files for GEA using LFMM
.\LFMM\Pops12.ind12.PC1.env
Environment file with population specific values for PC1 used in the LFMM outlier detection
.\LFMM\Pops12.ind12.PC2.env
Environment file with population specific values for PC2 used in the LFMM outlier detection
.\LFMM\Pops12.ind12.lfmm_K10L31-imputed.lfmm
LFMM genotype file used in the LFMM outlier detection analyses\
All files are ordered according to populations as: NE3.fi; NE2.fi; NE1.fi; N.se; CE.pl; SE.ro; C1.de; C2.cz; C3.cz; S.it; SW1.fr; SW2.fr
.\Bayenv2.zip
Folder containing files for GEA using Bayenv2
.\Bayenv2\Pops12.ind12.Bayenv.SNPSFILE
Bayenv2 SNP file
.\Bayenv2\Pops12.ind12.Bayenv.MATRIXFILE
Bayenv2 Matrix file used to control for effects of neutral population structure
.\Bayenv2\Pops12.ind12.Bayenv.ENVIRONFILE
Bayenv2 environmental file with population specific values for PC1 and PC2\
All files are ordered according to populations as: NE3.fi; NE2.fi; NE1.fi; N.se; CE.pl; SE.ro; C1.de; C2.cz; C3.cz; S.it; SW1.fr; SW2.fr
.\RDA.zip
Folder containing files for GEA and partitioning variation using redundancy analysis in R
.\RDA\RDA_Script.R
R script used to detect outliers and partition variation using RDA
.\RDA\PopulationValues.csv
Population specific values used as input for outlier detection using RDA
1. Number of rows: 12 \
2. Number of variables: 20\
3. Variable list:\
* Population: Population code\
* Location: Population location name\
* Latitude: Standardized population latitude\
* Longitude: Standardized population longitude\
* 10 worldclim variables (complete description: https://www.worldclim.org/data/bioclim.html)\
* PCA1: Population specific value for principal component 1\
* PCA2: Population specific value for principal component 2\
* PC1, PC2, PC3, PC4: Variables depicting population structure as detected using NGSCovar.
.\RDA\Pops12Ind12.freqs
Population allele frequencies for all 21,892 sites used as community matrix in redundancy analysis
.\RDA\ LociSubsets
Folder that contains files to create subsets for specific sets of outlier loci for variance partitioning
Software versions used:
Bayenv: BayenV2
LFMM: R package LEA v1.6.0
RDA: R package Vegan v2.5-4
Sharing/Access information
Raw sequence reads are deposited in the SRA (Bioproject PRJNA1035302).
Methods
We investigate genomic adaptations in a small mammal distributed throughout Europe (3,200 km) using a multivariate and multimethod approach. We sampled 12 populations and 276 individuals using a ddRAD sequencing approach. We found strong spatial structuring of populations, and identified candidate genes for climate adaptation using genotype-environmental association methods.