Data and code from: Evaluating genomic offset predictions in a forest tree with high population genetic structure
Data files
May 22, 2024 version files 103.81 MB
-
AnnualClimaticData_FromClimateDT_PopulationLocations.csv
2.01 MB
-
AveragedClimaticVariables_FuturePeriod_PopulationLocations.csv
66.66 KB
-
AveragedClimaticVariables_ReferencePeriod_PopulationLocations.csv
20.91 KB
-
ClimateDT_CommonGardens.csv
23.35 KB
-
ClimateDT_NFIPlots_PastClimates.csv
7.23 MB
-
ClimateDT_NFIPlots_SurveyClimates.csv
12.79 MB
-
CommonGardenData_cleaned.csv
1.19 MB
-
FormattedFilteredGenomicData_AlleleCounts_withmaf.csv
9.61 MB
-
FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv
9.12 MB
-
FormattedFilteredGenomicData_AlleleFrequencies_withoutmaf.csv
3.66 MB
-
HeightIntercepts_Archambeauetal2022.csv
2.96 KB
-
ImputedGenomicData_AlleleCounts_withmaf.csv
9.49 MB
-
ImputedGenomicData_AlleleCounts_withoutmaf.csv
9.01 MB
-
ImputedGenomicData_AlleleFrequencies_withmaf.csv
3.62 MB
-
ImputedGenomicData_AlleleFrequencies_withoutmaf.csv
3.56 MB
-
NFIdata_cleaned.csv
1.24 MB
-
PopulationStructureCorrea2015.csv
42.18 KB
-
RawGenomicData.csv
29.80 MB
-
README.md
28.63 KB
-
SNPsInformation_WithOutliers.csv
1.31 MB
Feb 21, 2025 version files 104 MB
-
AnnualClimaticData_FromClimateDT_PopulationLocations.csv
2.01 MB
-
AveragedClimaticVariables_FuturePeriod_PopulationLocations.csv
66.66 KB
-
AveragedClimaticVariables_ReferencePeriod_PopulationLocations.csv
20.91 KB
-
ClimateDT_CommonGardens.csv
23.35 KB
-
ClimateDT_NFIPlots_PastClimates.csv
7.23 MB
-
ClimateDT_NFIPlots_SurveyClimates.csv
12.79 MB
-
CommonGardenData_cleaned.csv
1.19 MB
-
FormattedFilteredGenomicData_AlleleCounts_withmaf.csv
9.61 MB
-
FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv
9.12 MB
-
FormattedFilteredGenomicData_AlleleFrequencies_withoutmaf.csv
3.66 MB
-
HeightIntercepts_Archambeauetal2022.csv
2.96 KB
-
ImputedGenomicData_AlleleCounts_withmaf.csv
9.49 MB
-
ImputedGenomicData_AlleleCounts_withoutmaf.csv
9.01 MB
-
ImputedGenomicData_AlleleFrequencies_withmaf.csv
3.62 MB
-
ImputedGenomicData_AlleleFrequencies_withoutmaf.csv
3.56 MB
-
NFIdata_cleaned.csv
1.24 MB
-
PopulationStructureCorrea2015.csv
42.18 KB
-
RawGenomicData.csv
29.80 MB
-
README.md
30.29 KB
-
SNPsInformation_WithOutliers.csv
1.50 MB
Apr 11, 2025 version files 177.59 MB
-
AnnualClimaticData_FromClimateDT_PopulationLocations.csv
2.01 MB
-
AveragedClimaticVariables_FuturePeriod_PopulationLocations.csv
66.66 KB
-
AveragedClimaticVariables_ReferencePeriod_PopulationLocations.csv
20.91 KB
-
ClimateDT_CommonGardens.csv
23.35 KB
-
ClimateDT_Metadata.csv
2.56 KB
-
ClimateDT_NFIPlots_PastClimates.csv
7.23 MB
-
ClimateDT_NFIPlots_SurveyClimates.csv
12.79 MB
-
CommonGardenData_cleaned.csv
1.19 MB
-
FormattedFilteredGenomicData_AlleleCounts_withmaf.csv
9.61 MB
-
FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv
9.12 MB
-
FormattedFilteredGenomicData_AlleleFrequencies_withoutmaf.csv
3.66 MB
-
go_cg_predictions.csv
1.02 MB
-
go_nfi_predictions.csv
70.30 MB
-
go_predictions_climdist.csv
1 MB
-
HeightIntercepts_Archambeauetal2022.csv
2.96 KB
-
ImputedGenomicData_AlleleCounts_withmaf.csv
9.49 MB
-
ImputedGenomicData_AlleleCounts_withoutmaf.csv
9.01 MB
-
ImputedGenomicData_AlleleFrequencies_withmaf.csv
3.62 MB
-
ImputedGenomicData_AlleleFrequencies_withoutmaf.csv
3.56 MB
-
maritime_pine_coord.txt
325.22 KB
-
NFIdata_cleaned.csv
1.24 MB
-
PopulationStructureCorrea2015.csv
42.18 KB
-
RawGenomicData.csv
29.80 MB
-
README.md
39.30 KB
-
SNPsInformation_WithOutliers.csv
1.55 MB
-
SNPsInformation.csv
876.03 KB
Abstract
Genomic offset models are increasingly popular tools for identifying populations at risk of maladaptation under climate change. These models estimate the extent of genetic change required for populations to remain adapted under future climate change scenarios, but face strong limitations and still lack broad empirical testing. Using 9,817 single nucleotide polymorphisms (SNPs) genotyped in 454 trees from 34 populations of maritime pine, a species with a marked population genetic structure, we found substantial variability across genomic offset predictions from different methods, SNP sets, and general circulation models. Using five common gardens, we mostly found positive associations between genomic offset predictions and mortality, as expected. However, contrary to our expectations, we observed very few negative monotonic associations between genomic offset predictions and height. Higher mortality rates were also observed in national forest inventory plots with high genomic offset, but only for some methods and SNP sets. The differing genomic offset patterns produced by the best-validated methods across the maritime pine range hindered drawing definitive conclusions for the species. Our study demonstrates the imperative of employing different methods and validating genomic offset predictions with independent data sources before using them as reliable metrics to inform conservation or management.
Juliette Archambeau1,2, Marta Benito-Garzón1, Marina de-Miguel1,3, Alexandre Changenet1, Francesca Bagnoli4, Frédéric Barraquand5, Maurizio Marchi4, Giovanni G. Vendramin4, Stephen Cavers2, Annika Perry2 and Santiago C. González-Martínez1
1 INRAE, Univ. Bordeaux, BIOGECO, F-33610 Cestas, France
2 UK Centre for Ecology & Hydrology, Bush Estate, Penicuik, United Kingdom
3 EGFV, Univ. Bordeaux, Bordeaux Sciences Agro, INRAE, ISVV, F-33882, Villenave d’Ornon, France
4 Institute of Biosciences and BioResources, National Research Council, 50019 Sesto Fiorentino, Italy
5 CNRS, Institute of Mathematics of Bordeaux, F-33400 Talence, France
Corresponding author: Juliette Archambeau, juli.archambeau@gmail.com
Author contributions
SCG-M designed the experiment and supervised the curation of field data. MdM cleaned and formatted the phenotypic data in the common gardens. MM extracted the climatic values at the location of the populations and the National Forest Inventory plots, and provided the raster files to project the genomic offset predictions across the species range. F Bagnoli, SCG-M and GGV did the DNA extractions, production, cleaning and checking of the genomic data. AC cleaned and formatted mortality data from the National Forest Inventory plots. SCG-M, JA and MBG conceived the paper methodology. JA and F Barraquand built the Bayesian models and codes in the validation steps. JA conducted the data analyses and wrote the code available in the present repository.
Data
Genomic data
Fo genomic data filtering and formatting, see 2_FormattingGenomicData.qmd
.
Population genetic structure
PopulationStructureCorrea2015.csv
Proportion of assignment to the six gene pools identified in Jaramillo-Correa et al. (2015). This dataset contains 523 genotypes but only 454 genotypes were used in the present study because the other genotypes had no genomic data or too much missing data. Meaning of the columns:
clon
: clone (i.e., genotype)pop
: population (i.e., provenance)Q1
: proportion of assignment to the northern African (NA) gene pool for each clone.Q2
: proportion of assignment to the Corsican (C) gene pool for each clone.Q3
: proportion of assignment to the central Spain (CS) gene pool for each clone.Q4
: proportion of assignment to the French Atlantic (FA) gene pool for each clone.Q5
: proportion of assignment to the Iberian Atlantic (IA) gene pool for each clone.Q6
: proportion of assignment to the south-eastern Spain (SES) gene pool for each clone.max.Q
: ID of the main gene pool of each clone.main_gp
: full name of the main gene pool of each clone.color_main_gp
: color attributed to the main gene pool of each clone in the figures.
Raw genomic data
RawGenomicData.csv
Genomic data before data filtering and formatting. NAs are indicated with ---
. Meaning of the columns:
clone
: clone ID.assay
: assay in which the clone was genotyped, either the Infinium assay (only_Inf
), the Axiom assay (only_Affx
) or both assays (both_Inf_Affx
).snp_1
tosnp_14016
: genotype for each of the 14,016 SNPs.
Filtered and formatted allele counts
FormattedFilteredGenomicData_AlleleCounts_withmaf.csv
and FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv
FormattedFilteredGenomicData_AlleleCounts_withmaf.csv
contain alleles not filtered for minor allele frequencies (MAF); and FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv
containes alleles filtered for MAF < 1%..
These datasets are allele counts of each genotype (coded as 0, 1 or 2). SNPs in rows and genotypes in columns. Meaning of the columns:
snp_ID
: SNP ID.- to 455.
ALT10
toVER9
: Clone (i.e., genotype) ID.
Filtered and formatted allele frequencies
FormattedFilteredGenomicData_AlleleFrequencies_withoutmaf.csv
Allele frequencies of the populations filtered for MAF < 1%. SNPs in columns and populations in rows. Meaning of the columns:
pop
: population.- to 9818. SNPs ID.
Imputed allele counts
Missing values were imputed based on the most common allele within the main gene pool of the genotype of concern (although we acknowledge that some genotypes had high admixture rates).
ImputedGenomicData_AlleleCounts_withmaf.csv
and ImputedGenomicData_AlleleCounts_withmaf.csv
ImputedGenomicData_AlleleCounts_withmaf.csv
is not filtered for MAF. ImputedGenomicData_AlleleCounts_withoutmaf.csv
is filtered for MAF < 1%.
These datasets contain imputed allele counts for each genotype (coded as 0, 1 or 2) . SNPs in rows and genotypes in columns.Meaning of the columns:
snp_ID
: SNP ID.- to 455.
ALT10
toVER9
: Clone (i.e., genotype) ID.
Imputed allele frequencies
ImputedGenomicData_AlleleFrequencies_withmaf.csv
and ImputedGenomicData_AlleleFrequencies_withoutmaf.csv
ImputedGenomicData_AlleleFrequencies_withmaf.csv
is not filtered for MAF. ImputedGenomicData_AlleleFrequencies_withoutmaf.csv
is filtered for MAF < 1%.
Theses datasets contain imputed allele frequencies of the populations. SNPs in columns and populations in rows. Meaning of the columns:
pop
: population.- to 9818. SNPs ID.
SNP information
SNPsInformation.csv
.
Information of the different SNP IDs used across studies/assays, SNP position on the genome, SNP annotation and whether the SNPs have a minor allele frequency. SNPs are in rows. Meaning of the columns:
snp_ID
: SNP ID used in the present study.original_ID
: Original ID of the SNP.affx_ID
: SNP ID in the Axiom assay.infinium_ID
: SNP ID in the Infinium assay.scaffold/contig
: contig on which the SNP is located. The term scaffold is also used as some SNPs were obtained from the alignment of NGS short-reads from a pseudoreference genome in Pinus pinaster that were called scaffolds (even though there are not really scaffolds).genome_position
: position of the SNP on the scaffold/contig (in bp).annotation
: SNP name (including possible alternative names).MAF_filtering
:REMOVED
for SNPs with minor allele frequencies, which are removed for GEAs analyses but not for estimating the neutral population genetic structure;KEPT
otherwise.
SNPsInformation_WithOutliers.csv
.
This dataset is similar to SNPsInformation.csv
(the first eight columns are identical) but includes additional information about the identification of SNPs by the different gene-environment association (GEA) methods (GF, RDA, pRDA, LFMM and BayPass), and about the inclusion of the SNPs in the final sets of control and candidate SNPs (see 7_GeneratingSNPsets.qmd
report). Meaning of the columns:
RDA_outliers
:TRUE
if the SNP was identified as an outlier in the RDA;FALSE
otherwise.pRDA_outliers
:TRUE
if the SNP was identified as an outlier in the pRDA;FALSE
otherwise.GF_outliers
:TRUE
if the SNP was identified as an outlier with the GF algorithm;FALSE
otherwise.LFMM_outliers
:TRUE
if the SNP was identified as an outlier woth LFMM;FALSE
otherwise.BayPass_outliers
:TRUE
if the SNP was identified as an outlier with BayPass;FALSE
otherwise.all_candidate_SNPs
:TRUE
if the SNP was an outlier identified by at least one GEA method among RDA, pRDA, LFMM, BayPass and GF;FALSE
otherwise.common_candidate_SNPs
:TRUE
if the SNP was an outlier identified by at least two GEA methods among RDA, pRDA, LFMM, BayPass and GF;FALSE
otherwise.candidate_SNPs_corrected_pop_structure
:TRUE
if the SNP was an outlier identified by at least one GEA method correcting for population genetic structure (i.e., pRDA, LFMM or BayPass);FALSE
otherwise.random_control_SNPs
:TRUE
if the SNP was a control SNP randomly sampled among the SNPs not identified by any GEA method;FALSE
otherwise.control_SNPs_matching_allele_freq
:TRUE
if the SNP was a control SNP sampled among the SNPs not identified by any GEA method, and has a similar allele frequency to a candidate SNP;FALSE
otherwise.control_SNPs_without_NAs
:TRUE
if the SNP had no mising data;FALSE
otherwise.
maritime_pine_coord.txt
SNP positions on the chromosome-scale reference genome of a related species (i.e., Pinus tabuliformis; Niu et al. 2022), as such a reference is not yet available for maritime pine. Meaning of the columns:
1. AxiomID
: SNP ID in the Axiom assay.
2. CHROM
: assigned chromosome of the SNP on the reference genome. Contig was indicated when no chromosome was assigned.
3. Absolute_POS
: position of the SNP on the reference genome.
Climatic data
Climatic data was extracted with the Climate Downscaling Tool (ClimateDT). The full name, units and calculation method of the climatic variables can be found on the ClimateDT website: https://www.ibbr.cnr.it/climate-dt/?action=fldlist. Note that the variable called mean summer precipitation (MSP
) in ClimateDT was renamed summer precipitation (SP
) in the present study.
tmn01
totmn12
: monthly minimum temperature (°C).tmx01
totmx12
: monthly maximum temperature (°C).prc01
toprc12
: monthly total precipitation (mm).bio1
: mean annual temperature (°C).bio2
: mean diurnal range (mean of monthly (maximum temperature - minimum temperature)) (°C).bio3
: isothermality (bio2
/bio7
) (×100) (index).bio4
: temperature seasonality (standard deviation ×100) (°C).bio5
: maximum temperature of the warmest month (°C).bio6
: minimum temperature of the coldest month (°C).bio7
: temperature annual range (bio5
-bio6
) (°C).bio8
: mean temperature of the wettest quarter (°C).bio9
: mean temperature of the driest quarter (°C).bio10
: mean temperature of the warmest quarter (°C).bio11
: mean temperature of the coldest quarter (°C).bio12
: annual precipitation (mm).bio13
: precipitation of the wettest month (mm).bio14
: precipitation of the driest month (mm).bio15
: precipitation seasonnality (coefficient of variation) (index).bio16
: precipitation of the wettest quarter (mm).bio17
: precipitation of the driest quarter (mm).bio18
: precipitation of the warmest quarter (mm).bio19
: precipitation of the coldest quarter (mm).Eref
: Hargreaves reference evaporation (mm).CMD
: Hargreaves climatic moisture deficit (mm).MCMT
:mean coldest month temperature (°C).SP
: summer (May to September) precipitation (mm).MWMT
: summer (May to September) precipitation (mm).GDD0
: degree-days above 0°C, chilling degree-days (°C * days).GDD5
: degree-days above 5°C, chilling degree-days (°C * days).GDD18
: degree-days above 18°C, chilling degree-days (°C * days).NFFD
: number of frost-free days (day).DMA
: De Martonne aridity index (index).EPQ
: Emberger pluviothermic quotient (index).RMT
: Rivas-Martinex thermic index (index).AHM
: annual heat moisture index (bio1+10)/(bio12/1000)) (°C/mm).SHM
: summer heat moisture index MWMT/(MSP/1000) (°C/mm).TD
: temperature difference MWMT-MCMT (°C).EMT
: extreme minimum temperature over 30 years (°C).bFFP
: Julian date on which FFP begins (day).eFFP
: Julian date on which FFP ends (day).FFP
: frost-free period (day).PAS
: precipitation as snow (mm).spi01
tospi12
: monthly standardized precipitation index (index).spei01
tospei12
: monthly standardized precipitation index (index).
This information on the climatic variables used in the present study is also available in the ClimateDT_Metadata.csv
dataset. In this dataset, the first column contains the full names of the climatic variables, the second column lists their units, and the third column provides their codes.
Climate in the common gardens
ClimateDT_CommonGardens.csv
Annual climatic data (from 2010 to 2018) at the location of the five common gardens: Asturias (Spain), Bordeaux (France), Cáceres (Spain), Madrid (Spain) and Fundão (Portugal).
cg
: common garden.longitude
of the common garden.latitude
of the common garden.elevation
of the common garden.year
between 2010 and 2018.- to 104. climatic variables (see names above).
Climate at the location of the populations
AnnualClimaticData_FromClimateDT_PopulationLocations.csv
Annual climatic data (from 1901 to 2021) at the location of the 34 CLONAPIN populations.
pop
: population.longitude
of the population.latitude
of the population.elevation
of the population.year
between 1901 and 2021.- to 103. climatic variables (see names above)
AveragedClimaticVariables_ReferencePeriod_PopulationLocations.csv
Averages of the climatic variables over the reference period 1901-1950 at the location of the 34 CLONAPIN populations.
pop
: population.longitude
of the population.latitude
of the population.elevation
of the population.- to 64. climatic variables (see names above).
AveragedClimaticVariables_FuturePeriod_PopulationLocations.csv
Predicted averages of the climatic variables at the location of the 34 CLONAPIN populations over the period 2041-2070, under the scenario SSP3.7-0 and for five climatic general circulation models (GCMs): GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0 and UKESM1-0-LL (column gcm
).
pop
: population.gcm
: general circulation model.- to 26. climatic variables (see names above).
Climate in the inventory plots
ClimateDT_NFIPlots_PastClimates.csv
Averaged climatic data at the location of the NFI plots for the reference period 1901-1950.
plotcode
: code of the inventory plot.longitude
of the plot.latitude
of the plot.elevation
of the plot.- to 64. climatic variables (see names above).
ClimateDT_NFIPlots_SurveyClimates.csv
Averaged climatic data at the location of the NFI plots for the survey periods specific to each plot.
plotcode
: code of the inventory plot.longitude
of the plot.latitude
of the plot.elevation
of the plot.- to 64. climatic variables (see names above).
National Forest Inventory (NFI) data
NFIdata_cleaned.csv
This dataset contains mortality data from the natural populations of the Spanish and French NFI plots in which maritime pines were recorded. Meaning of the columns:
plotcode
: code of the NFI plot.longitude
: longitude of the NFI plot.latitude
: latitude of the NFI plot.country
: country of the NFI plot (ES = Spain, FR = France).nb_years
: number of years between surveys. It corresponds to the number of years between the first and second inventory in Spain and is equal to 5 in the French inventory as mortality was estimated in the five years before the survey date.first_survey
: year of the first survey in the Spanish NFI (NA
for the French NFI)second_survey
: year of the second survey in the Spanish NFI and the unique survey in the French NFInb_dead
: number of dead maritime pines in the NFI plot.nb_tot
: total number of maritime pines in the NFI plot (including the dead ones).basal_area
: basal area of all tree species in the NFI plot (proxy of the competition among trees).mean_DBH
: mean DBH of the maritime pines in the NFI plot including dead trees, adults and saplings/seedlings (recruitment).prop_dead
: proportion of dead maritime pines in the NFI plot.annual_prop_dead
: annual proportion of maritime pine mortality in the NFI plot.
Common garden data
CommonGardendata_cleaned.csv
This dataset contains mortality and height data from five clonal common gardens. Meaning of the columns:
cg
: name of the common garden.block
: block in which the tree was planted.pop
: population of the tree.clon
: genotype of the tree.tree
: tree ID.AST_htmar14
: height measurements in March 2014 in Asturias (Spain) when the trees were 37 month-old (trees planted in February 2011).BDX_htnov2018
: height measurements in November 2018 in Bordeaux (France) when the trees were 85 month-old (trees planted in October 2011).CAC_htdec11
: height measurements in December 2011 in Cáceres (Spain) when the trees were 8 month-old (trees planted in April 2011).MAD_htdec11
: height measurements in December 2011 in Madrid (Spain) when the trees were 13 month-old (trees planted in November 2010).POR_htmay13
: height measurements in May 2013 in Fundão (Portugal) when the trees were 27 month-old (trees planted in February 2011).AST_survmar14
: survival data (0
for dead trees and1
for survivors) in March 2014 in Asturias.BDX_surv18
: survival data in November 2018 in Bordeaux.CAC_survdec11
: survival data in December 2011 in Cáceres.MAD_survdec11
: survival data in December 2011 in Madrid.POR_survmay13
: survival data in May 2013 in Fundão.
HeightIntercepts_Archambeauetal2023.csv
This dataset contains the population height intercepts calculated across the five common gardens in the model 1 of Archambeau et al. (2022). Meaning of the columns:
pop
: population ID.height
: mean of the posterior distributions of the population varying intercepts.std.error
: standard error of the mean of the posterior distributions of the population varying intercepts.- and 5.
conf.low
andconf.high
: credible intervals of the posterior distributions of the population varying intercepts.
Results
Genomic offset predictions and climatic distances at the location of the populations
go_predictions_climdist.csv
Dataset generated in 11_ComparingGenomicOffsetPredictions.qmd
. Meaning of the columns:
var_name
: name of the set of SNPs or climatic variable.method_name
: name of the genomic offset method or CD for climatic distance.var_code
: code of the set of SNPs or climatic variable.pop
: population.val
: value of the genomic offset or the climatic distance.gcm
: general circulation model used for future climate (2041-2060; SSP 3.7-0).main_gp_pop
: main gene pool of the population.color_main_gene_pool
: color used in the figures for the main gene pool of each population.
Genomic offset predictions and climatic distances at the location of the inventory plots
go_nfi_predictions.csv
Dataset generated in 12_ValidationNFI.qmd
. Meaning of the columns:
snp_set_code
: code of the set of SNPs (or code of the climatic variables for the climatic distances).snp_set_name
: name of the set of SNPs (or name of the climatic variables for the climatic distances).plotcode
: code of the inventory plot.method
: name of the genomic offset method or CD for climatic distances.GO
: value of the genomic offset or the climatic distance.method_snp_set_code
: code of the combination of the genomic offset method (or climatic distance) and the set of SNPs (or climatic variable).method_snp_set_name
: full name of the combination of the genomic offset method (or climatic distance) and the set of SNPs (or climatic variable).
go_cg_predictions.csv
Dataset generated in 13_ValidationCommonGardens.qmd
. Meaning of the columns:
cg
: name of the common garden.pop
: population.input_code
: code of the set of SNPs (or code of the climatic variables for the climatic transfer distances).varX
: value of the genomic offset or the climatic transfer distance.method
: name of the genomic offset method or CTD for climatic transfer distances.method_input_code
: code of the combination of the genomic offset method (or climatic transfer distance) and the set of SNPs (or climatic variable).input_name
: name of the set of SNPs (or code of the climatic variables for the climatic transfer distances).method_input_name
: full name of the combination of the genomic offset method (or climatic transfer distance) and the set of SNPs (or climatic variable).
Scripts
Main reports
Quarto (or rmarkdown) and html versions of these scripts are available in the DRYAD repository.
All data used in these scripts are available in the DRYAD repository, except for some large rasters used to project genomic offset predictions across the range of maritime pine. These climatic data are available on the ClimateDT website: https://www.ibbr.cnr.it/climate-dt/.
The objects computed in these scripts are stored in an /outputs
folder that users will need to create on their own computers to run the scripts successfully.
1_MainPopulationGenePool_CommonGardenClimates.qmd
- Attributing to each clone and population its main gene pool based on the ancestry coefficients estimated with the STRUCTURE software in Jaramillo-Correa et al. (2015).
- Calculating the average of the climatic variables between the planting and the measurement dates in the common gardens.
2_FormattingGenomicData.Rmd
- Formatting genomic data: converting letters (e.g. A/A, A/G) to numbers (0,1 or 2), and
---
toNA
. - Filtering genomic data for monomorphic SNPs, minor allele counts (MAC), proportion of missing data per clone and per SNP, minor allele frequencies (MAF).
- Determining SNPs position on the genome.
- Exploring genomic data, e.g., number of SNPs/clones genotyped in each assay, Average and maximum number of missing values per clone.
- Imputation of missing data.
3_ReduncancyAnalysis_VariancePartionning_IdentificationCandidateSNPs.Rmd
- Selection of the climatic variables based on their biological relevance for maritime pine, their contribution to the genetic variance using a RDA-based stepwise selection (Capblancq and Forester 2021) and the magnitude of their exposure to climate change.
- Partitioning genomic variation among climate, neutral population genetic structure (accounted for with the main axes of a PCA) and geography (accounted for with population coordinates or distance-based Moran’s eigenvector maps).
- Identification of the outlier SNPs using Redundancy analysis (RDA) and partial RDA; approach developed in Capblancq et al. (2018) and Capblancq and Forester (2021).
4_GradientForest_IdentificationCandidateSNPs.qmd
- Identification of outlier SNPs with the Gradient Forest (GF) algorithm as described in Fitzpatrick et al. (2021) and Capblancq et al. (2023).
5_BaypassAnalysis_IdentificationCandidateSNPs.qmd
- Identification of outlier SNPs with BayPass; an approach described in Gautier (2015).
6_LFMM_IdentificationCandidateSNPs_GenomicOffsetPredictions.qmd
- Identification of outlier SNPs and estimation of the genetic gap (i.e., genomic offset) with the
LEA
R package (Gain & Francois 2021), which uses the latent factor mixed model (LFMM) approach (Frichot et al. 2013, Caye et al. 2019). Mathematical and theoretical details of the approach in Gain et al. (2023).
7_GeneratingSNPsets.qmd
- Identifying common outlier SNPs across the different gene-environment association (GEA) methods.
- Checking the genome position of the outlier SNPs; when some of them were located on the same scaffold/contig, only the SNP with the lower $p$-value in the RDA was kept in the final sets of candidate SNPs.
- Mapping SNP position on the reference genome of Pinus tabuliformis (Niu et al. 2022).
- Generating three sets of candidate SNPs and two sets of control SNPs (with the same number of SNPs as in the set with all candidate SNPs).
8_RedundancyAnalysis_GenomicOffsetPredictions.qmd
- Genomic offset predictions with Redundancy Analysis (RDA) and partial RDA; approach described in Capblancq and Forester (2021).
- Projections of the genomic offset predictions across the maritime pine range. Note that the rasters used for these projections (i.e., one for each climatic variable, and each general circulation model, so 6 rasters for the 1901-1950 reference climate; and 6*5=30 rasters for the 2041-2060 future climates) are not available in the DRYAD repository but are available on the ClimateDT website.
9_GeneralizedDissimilarityModelling_GenomicOffsetPredictions.qmd
- Genomic offset predictions with the Generalized Dissimilarity Modelling (GDM) approach, as described in Fitzpatrick & Keller (2015), Mokany et al. (2022) and the GDM website.
10_GradientForest_GenomicOffsetPredictions.qmd
- Genomic offset predictions with the Gradient Forest (GF) algorithm, approach described in Fitzpatrick & Keller (2015) and Gougherty et al. (2021).
11_ComparingGenomicOffsetPredictions.qmd
- Comparing genomic offset predictions across the different methods (GF, GDM, LFMM, RDA and pRDA), SNPs sets (control and candidate SNPs) and GCMs.
12_ValidationNFI.qmd
- Exploring mortality data from the National Forest Inventory (NFI) plots of France and Spain; see Changenet et al. (2021).
- Calculating the climatic distances between the reference climate (1901-1950) and the climate during inventories at the location of the inventory plots.
- Estimating the association between the genomic offset predictions and mortality rates in the NFI plots (with regression coefficients or correlations).
13_ValidationCommonGardens.qmd
- Calculating the climate transfer distances (CTDs), i.e., absolute climatic difference between the climates at the location of the populations and the common gardens.
- Estimating the association between genomic offset predictions / CTDs and mortality and height data from the five clonal common gardens (CLONAPIN network).
RPackageCitations.qmd
- Citations of the R packages used in the present study.
Additional functions
Additionnal functions used in the main scripts:
calc_avg_clim_var.R
- function to calculate average climatic variables over a specific period, based on the monthly climatic data from ClimateDT. In this function, thewindow
function comes from thedismo
R package. Under the GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.corpmat.R
- to compute a matrix of p-values; from Statistical Tools for High-throughput Data Analysis & Visualization (STHDA) website: http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram.detectoutliers.R
- to identify outliers based on their RDA loadings (from Forester et al. 2018); from Forester et al. (2018): https://popgen.nescent.org/2018-03-27_RDA_GEA.html.kable_mydf.R
- my own function to print table using thekableExtra
R package.make_eucli_plot.R
- my own function to plot genomic offset predicions against Euclidean climatic distances, using R base plots.make_go_map.R
- my own function to plot genomic offset predictions at specific locations.rdadapt.R
- to conduct a RDA based genome scan; from Capblancq & Forester (2021): https://github.com/Capblancq/RDA-landscape-genomics/blob/main/src/rdadapt.R.
Stan code of the models
Stan code of the models used in the validation steps:
ValidationNFI_stancode.stan
- model used to estimate the association between mortality rates and genomic offset predictions/climatic distances in the inventory plots.ValidationNFI_stancode_with_predictions.stan
- same model asValidationNFI_stancode.stan
but including predictions of mortality rates in the model outputs.ValidationNFI_stancode_with_predictions_withoutGO.stan
- same model asValidationNFI_stancode_with_predictions.stan
but without genomic offset predictions or climatic distances as predictors (model used to estimate correlations between the model residuals and genomic offset predictions/climatic distances).ValidationCommonGarden_BinomialMortalityModel.stan
- model used to estimate the association between mortality rates and genomic offset predictions/CTDs in the common gardens.ValidationCommonGarden_BinomialMortalityModel_WithoutGO.stan
- same model asValidationCommonGarden_BinomialMortalityModel.stan
but without genomic offset predictions or climatic distances as predictors (model used to estimate correlations between the model residuals and genomic offset predictions/climatic distances).ValidationCommonGarden_HeightModel.stan
- model used to estimate the association between height and genomic offset predictions/CTDs in the common gardens.ValidationCommonGarden_HeightModel_WithoutPredictor.stan
- same model asValidationCommonGarden_HeightModel.stan
but without genomic offset predictions or climatic distances as predictors, model used to estimate the proportion of variance explained in a model that does not include the predictor of interest.ValidationCommonGarden_HeightModel_WithoutPredictor_WithPredictions.stan
- same model asValidationCommonGarden_HeightModel_WithoutPredictor.stan
but including height predictions in the model outputs.
License
The code of this repository is under the MIT license.
MIT license
Copyright (c) 2024 Authors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Software versions
Analyses were undertaken with R version 4.3.3. R package versions are shown at the end of each script and in RPackageCitations.html
.
References
Archambeau J, Benito Garzón M, Barraquand F, de Miguel M, Plomion C and González-Martı́nez SC (2022). Combining climatic and genomic data improves range-wide tree height growth prediction in a forest tree. The American Naturalist 200(4):E141–E159.
Capblancq T, Luu K, Blum MGB and Bazin E (2018). Evaluation of redundancy analysis to identify signatures of local adaptation. Molecular Ecology Resources 18(6):1223–1233.
Capblancq T and Forester BR (2021). Redundancy analysis: A Swiss army knife for landscape genomics. Methods in Ecology and Evolution 12(12):2298–2309.
Capblancq T, Lachmuth S, Fitzpatrick MC and Keller SR (2023). From common gardens to candidate genes: exploring local adaptation to climate in red spruce. New Phytologist 237(5):1590–1605.
Caye K, Jumentier B, Lepeule J and François O (2019). LFMM 2: Fast and accurate inference of gene-environment associations in genome-wide studies. Molecular Biology and Evolution 36(4):852–860.
Changenet A, Ruiz-Benito P, Ratcliffe S, Fréjaville T, Archambeau J, Porte AJ et al. (2021). Occurrence but not intensity of mortality rises towards the climatic trailing edge of tree species ranges in European forests. Global Ecology and Biogeography 30(7):1356–1374.
Fitzpatrick MC and Keller SR (2015). Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecology Letters 18(1):1–16.
Fitzpatrick MC, Chhatre VE, Soolanayakanahally RY and Keller SR (2021). Experimental support for genomic prediction of climate maladaptation using the machine learning approach Gradient Forests. Molecular Ecology Resources 21(8):2749–2765.
Forester BR, Lasky JR, Wagner HH, Urban DL (2018). Comparing methods for detecting multilocus adaptation with multivariate genotype-environment associations. Molecular Ecology 27(9):2215-2233.
Frichot E, Schoville SD, Bouchard G and François O (2013). Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular Biology and Evolution 30(7):1687–1699.
Gain C and François O (2021). LEA 3: Factor models in population genetics and ecological genomics with R. Molecular Ecology Resources 21(8):2738–2748.
Gain C, Rhoné B, Cubry P, Salazar I, Forbes F, Vigouroux Y et al. (2023). A quantitative theory for genomic offset statistics. Molecular Biology and Evolution 40(6):msad140.
Gautier M (2015). Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201(4):1555–1579.
Gougherty AV, Keller SR and Fitzpatrick MC (2021). Maladaptation, migration and extirpation fuel climate change risk in a forest tree species. Nature Climate Change 11:166–171.
Jaramillo-Correa JP, Rodrı́guez-Quilón I, Grivet D, Lepoittevin C, Sebastiani F, Heuertz M et al. (2015). Molecular proxies for climate maladaptation in a long-lived tree (Pinus pinaster Aiton, Pinaceae). Genetics 199(3):793–807.
Mokany K, Ware C, Woolley SN, Ferrier S and Fitzpatrick MC (2022). A working guide to harnessing generalized dissimilarity modelling for biodiversity analysis and conservation assessment. Global Ecology and Biogeography 31(4):802-821.
Niu S, Li J, Bo W, Yang W, Zuccolo A, Giacomello S, … & Wu HX (2022). The Chinese pine genome and methylome unveil key features of conifer evolution. Cell, 185(1), 204-217.
Version changes
29-jan-2025.
These updates were made in response to revisions following the first round of review in The American Naturalist.
The abstract was modified.
The dataset SNPsInformation_WithOutliers.csv
was also updated, here are the columns that were modified or added:
all_candidate_SNPs
:TRUE
if the SNP was an outlier identified by at least one GEA method among RDA, pRDA, LFMM, BayPass and GF;FALSE
otherwise.common_candidate_SNPs
:TRUE
if the SNP was an outlier identified by at least two GEA methods among RDA, pRDA, LFMM, BayPass and GF;FALSE
otherwise.candidate_SNPs_corrected_pop_structure
:TRUE
if the SNP was an outlier identified by at least one GEA method correcting for population genetic structure (i.e., pRDA, LFMM or BayPass);FALSE
otherwise.random_control_SNPs
:TRUE
if the SNP was a control SNP randomly sampled among the SNPs not identified by any GEA method;FALSE
otherwise.control_SNPs_matching_allele_freq
:TRUE
if the SNP was a control SNP sampled among the SNPs not identified by any GEA method, and has a similar allele frequency to a candidate SNP;FALSE
otherwise.
Reports and scripts were updated to estimate the genomic offset for six sets of SNPs: three sets of candidate SNPs and three sets of control SNPs. Additionally, the estimation of genomic offset using pRDA was incorporated. Following these new analyses, the scripts for generating tables and figures were also updated.
10-apr-2025.
These updates were made to format the data and code according to the standards of The American Naturalist. They are specified in the Version changes of the README.
Modifications of the datasets:
SNPsInformation_WithOutliers.csv
was divided into two datasets:SNPsInformation.csv
, which contains information only about SNP IDs, genomic positions, annotations, and MAF; andSNPsInformation_WithOutliers.csv
, which is the same dataset but includes additional information about outlier identification and the inclusion of SNPs in the sets of candidate and control SNPs.- Three datasets containing the genomic offset predictions and climatic distances estimated in the present study were uploaded in the following datasets:
go_predictions_climdist.csv
,go_nfi_predictions.csv
andgo_cg_predictions.csv
. - In the dataset
ClimateDT_NFIPlots_SurveyClimates.csv
, the plots have been reordered to match the order in the datasetClimateDT_NFIPlots_PastClimates.csv
. - The
ClimateDT_Metadata.csv
dataset was added to provide information on the climatic variables from ClimateDT. - The
maritime_pine_coord.txt
dataset was added to provide the SNP positions on a reference genome (i.e., the Pinus tabuliformis genome).
Modifications of the scripts:
- All reports have been revised between versions V2 and V3 to ensure the code is reproducible and includes only the sections of code relevant to the paper.
0_FormattingPopulationCoordinatesElevationClimateData.qmd
and2_CommonGardenData.qmd
were merged into1_MainPopulationGenePool_CommonGardenClimates.qmd
.1_FormattingGenomicData.Rmd
was renamed to2_FormattingGenomicData.Rmd
.3_CheckingPastFutureClimatesPopulationLocations.qmd
,14_ValidationNFI_ModelComparison.qmd
and16_PCAplotIDpops.qmd
were removed because they did not contain code that was directly used in the paper.4_ReduncancyAnalysis_VariancePartionning_IdentificationCandidateSNPs.Rmd
was renamed to3_ReduncancyAnalysis_VariancePartionning_IdentificationCandidateSNPs.Rmd
.5_GradientForest_IdentificationCandidateSNPs.qmd
was renamed to4_GradientForest_IdentificationCandidateSNPs.qmd
.6_BaypassAnalysis_IdentificationCandidateSNPs.qmd
was renamed to5_BaypassAnalysis_IdentificationCandidateSNPs.qmd
.7_LFMM_IdentificationCandidateSNPs_GenomicOffsetPredictions.qmd
was renamed to6_LFMM_IdentificationCandidateSNPs_GenomicOffsetPredictions.qmd
.8_GeneratingSNPsets.qmd
was renamed to7_GeneratingSNPsets.qmd
.11_RedundancyAnalysis_GenomicOffsetPredictions.qmd
was renamed to8_RedundancyAnalysis_GenomicOffsetPredictions.qmd
.12_ComparingGenomicOffsetPredictions.qmd
was renamed to11_ComparingGenomicOffsetPredictions.qmd
.13_ValidationNFI.qmd
was renamed to12_ValidationNFI.qmd
.15_ValidationCommonGardens.qmd
was renamed to13_ValidationCommonGardens.qmd
.- The names of the files containing the Stan code for the models have been updated for clarity, and the content of each file is now explicitly described in the README.
- Some additional scripts were removed as they were not relevant for the paper:
make_high_go_pop_maps.R
,project_adaptive_index.R
,generate_scaled_nfi_clim_datasets.R
,generate_scaled_clim_datasets.R
,extract_clim_from_rasters.R
,extract_climatedt_metadata.R
andbaypass_utils.R