Genomic signatures of adaptation across a precipitation gradient from niche center to niche edge
Data files
Mar 11, 2025 version files 105.74 MB
-
108_SNP_GDM_orthologs.csv
22.79 KB
-
Allele_Frequencies_Van_Deurs_Genomic_signatures.csv
86.55 MB
-
CandidatesOrdered_env1_K3_q0.05.csv
38.96 KB
-
GDM_Genomic_signatures.R
42.16 KB
-
LD_Genomic_signatures.R
7.86 KB
-
LFMM_Van_Deurs_Genomic_Signatures.R
58.28 KB
-
mart_export.csv
9.49 KB
-
ne_50m_admin_0_countries.zip
799.73 KB
-
ngsadmix_samples_ordered.csv
3.47 KB
-
Population_data.csv
1.28 KB
-
Random__snps.vcf.gz
17.25 MB
-
RDA_Van_Deurs_Genomic_Signatures.R
12.59 KB
-
README.md
13.57 KB
-
Significant_LFMM_SNPs.vcf.gz
902.46 KB
-
Structure_map_Genomic_Signatures.R
8.02 KB
-
Structure_pie_chart_locations.csv
547 B
-
Unlinked_SNPs.csv
14.20 KB
-
Used_EAA_Samples.csv
3.49 KB
Abstract
Evaluating the potential for species to adapt to changing climates relies on understanding current patterns of adaptive variation and selection, which might vary in intensity across a species’ niche, hence affecting our inference of where adaptation might be most important in the future. Here, we investigate the genetic basis of adaptation in Lactuca serriola along a steep precipitation gradient in Israel approaching the species’ arid niche limit and use candidate loci to inform predictions of its past and future adaptive evolution. Environmental association analyses combined with generalized dissimilarity models revealed 108 candidate genes showing non-linear shifts in allele frequencies across the gradient, with 66% of these genes under strong selection near the dry niche edge. We detected selection acting on genes with separate suites of biological functions, specifically related to phenology and responses to environmental stressors, including osmotic stress, at the dry niche edge, and related to biotic interactions and defense closer to the niche center. The adaptive genetic composition of populations, as inferred through polygenic risk scores, point to intensified selection operating towards the dry niche edge. However, inference of past and future evolutionary change predicts larger adaptive shifts occurring in the mesic part of the range, which is most affected by climate change. Our study reveals that adaptive shifts in response to climate change can be heterogeneous across a species' range and not necessarily strongest near its niche edge.
https://doi.org/10.5061/dryad.v6wwpzh68
Usage notes
Below find a list of file types and examples of how to open or work with them outside of R.
.csv files can be opened and manipulated with Microsoft Excel, Google Sheets, or a plain text editor.
.vcf.gz (compressed Variant Call Format (VCF)) files can be read into R using the function read.vcfR through the vcfR package, or through programs such as BamSeek or BCFtools.
Description of the data and file structure
Low-coverage whole genome sequencing was conducted on leaf samples of 21 Lactuca serriola populations sampled in 2020, spanning Israel's steep precipitation gradient.
Files and variables
File: 108_SNP_ GDM_orthologs.csv
Description: List of 108 SNPs retained after the application of GDM filters. File includes the name of AT orthologs obtained by blasting coding regions of Lactuca sativa genes, and their corresponding Score and E-value. Group indicates in which regions of Israel's precipitation of the wettest month gradient the candidate gene shows the largest turnover. GDM_explained acts as a proxy of model fit.
Variables
- SNP_ID: The name given to each SNP. The chromosome number is indicated by the two digits following "CM0225", minus 17 so that a SNP starting with "CM022518.1" lies on chromosome 1. This is followed by a string identifying each SNP's location on that chromosome so that "CM022518.1_142723134" is found at the locus 142723134 on chromosome 1.
- SNP_name: The name of the SNP generated by the GDM.
- SNP_number: The number of the SNP generated by the GDM, corresponds to the number in the SNP_name.
- GDM_explained: The percentage of deviance explained by the model.
- Geographic: The absolute importance of geographic distance as a predictor in the GDM.
- Bio7: The absolute importance of the BIO7 (temperature annual range) bioclimatic variable in the GDM.
- Bio13: The absolute importance of the BIO13 (precipitation of the wettest month) bioclimatic variable in the GDM.
- Geographic_rel: The relative (rescaled) importance of geographic distance compared to other predictors.
- Bio7_rel: The relative (rescaled) importance of BIO7 among all predictors.
- Bio13_rel: The relative (rescaled) importance of BIO13 among all predictors.
- min_Fst: The minimum pairwise Fst value, indicating the lowest genetic differentiation between populations.
- max_Fst: The maximum pairwise Fst value, indicating the highest genetic differentiation between populations.
- Fst_range: The range of genetic differentiation (max_Fst - min_Fst) across populations.
- Modtest_geo_imp: The estimated importance of geographic distance from the permutation test (
gdm.varImp). - Modtest_bio7_imp: The estimated importance of BIO7 from the permutation test.
- Modtest_bio13_imp: The estimated importance of BIO13 from the permutation test.
- Modtest_geo_sig: The p-value for the significance of geographic distance as a predictor.
- Modtest_bio7_sig: The p-value for the significance of BIO7 as a predictor.
- Modtest_bio13_sig: The p-value for the significance of BIO13 as a predictor.
- CHROM: The chromosome (1-9) containing the SNP.
- POS: The locus of the SNP on the chromosome.
- SNP_number.y: The number of the SNP generated by the GDM, corresponds to the number in the SNP_name.
- biomart: The location of the gene containing the SNP. This is on the format of* Chromosome: Gene Start- Gene End*.
- Gene.stable.ID: The name of the *Lactuca serriola *gene containing the identified SNP.
- group: The portion of the precipitation gradient where each SNP shows the greatest turnover, either 37-120mm or 120-202mm of precipitation in the wettest month.
- Ortholog: The name of the corresponding* Arabidopsis thaliana* ortholog*. *
- Symbol: The symbol associated with the* Arabidopsis thaliana* ortholog.
- Score: Score in Bits for the BLAST of each* Lactuca serriola *gene and the Arabidopsis thaliana ortholog with the largest Score (Bits) or largest E-value based on significant alignments. Orthologs determined using BLASTN 2.9.0+, Araport11 coding sequences (DNA).
- Evalue: The E-value of the best* Arabidopsis thaliana* ortholog for the given* Lactuca serriola* gene.
File: Allele_Frequencies_Van_Deurs_Genomic_signatures.csv
Description: File containing allele frequencies for 329 Lactuca serriola sequences. This .csv file can be opened in Microsoft Excel or a text editor, though opening this large file in Excel is not recommended.
Variables
- Columns: refer to 586,638 SNPs. Column names refer to individual SNP names used for identification. The name given to each SNP. The chromosome number is indicated by the two digits following "CM0225", minus 17 so that a SNP starting with "CM022518.1" lies on chromosome 1. This is followed by a string identifying each SNP's location on that chromosome so that "CM022518.1_142723134" is found at the locus 142723134 on chromosome 1.
- Rows: Each row refers to one population out of the the 21 sampled populations.
File: CandidatesOrdered_env1_K3_90.05.csv
Description: 500 significant SNPs identified by LFMM and their p-values, q-values, and zscore.
Variables
- SNPid: Name of the SNP identified by LFMM with significant associations with precipitation of the wettest month (mm) from WoldClim 2.1. The name given to each SNP. The chromosome number is indicated by the two digits following "CM0225", minus 17 so that a SNP starting with *"CM022518.1" *lies on chromosome 1. This is followed by a string identifying each SNP's location on that chromosome so that "CM022518.1_142723134" is found at the locus 142723134 on chromosome 1.
- zscore: The z-score of the indicated SNP as related to precipitation of the wettest month (mm).
- pvalue: The p-value of the association of the given SNP and precipitation of the wettest month (mm).
- qvalue: The significance of the association of the given SNP and precipitation of the wettest month (mm) after correcting for multiple testing.
File: mart_export.csv
Description: Lactuca sativa gene name, chromosome and position for 500 significant LFMM SNPs.
- Gene Stable ID: The name of the *Lactuca serriola *gene containing the identified SNP.
- Chromosome: The chromosome (1-9) containing the SNP.
- Gene start (bp): The locus where the gene starts on a given chromosome.
- Gene end (bp): The locus where the gene ends on a given chromosome.
File: ngsadmix_samples_ordered.csv
Description: EAA samples ordered to match those in the .qopt files used to generate STRUCTURE plot using the pophelper package in R.
Variables
- sample: List of 329 samples used in the study, in the order in which they appear in the STRUCTURE plot.
- population: Characters indicating the population identity of each sample (A-U).
File: Population_data.csv
Description: Contains information on the location and identify of populations.
Variables
- Generation: The year in which the seeds were collected (2020)
- Population: Assigned population identifier character (A-U)
- Access.number: A combination of generation and population
- Collection.date: Date of seed collection in field (DD.MM.YY)
- Collection.site: Name of collection site
- y: Latitude of collection site (decimal latitude)
- x: Longitude of collection site (decimal longitude)
- Precipitation.group: Assigned precipitation group based on mean annual precipitation (1-7)
File: Random _snps.vcf.gz
Description: VCF of random SNPs used for null GDM analysis if desired. VCF contains genotypes for 11,078 SNPs for 537 samples, which could then be subset to contain only those samples of interest in this study.
File: Significant_LFMM_SNPs.vcf.gz
Description: VCF of 500 significant SNPs identified using LFMM. VCF contains genotypes for 500 SNPs for 534 samples, then subset to contain only those samples of interest in this study.
File: Structure_pie_chart_locations.csv
Description: Longitude and Latitude for placement of pie charts in using STRUCTURE data.
Variables:
- new_x: longitude of pie chart in decimal degrees
- new_y: latitude of pie chart in decimal degrees
- Packages_symbol: Population identifier
File: Unlinked_SNPs.csv
Description: List of 66 unlinked SNPs retained after LFMM, GDM, and removal of linked SNPs.
Variables
- SNP_ID: The name given to each SNP. The chromosome number is indicated by the two digits following "CM0225", minus 17 so that a SNP starting with "CM022518.1" lies on chromosome 1. This is followed by a string identifying each SNP's location on that chromosome so that "CM022518.1_142723134" is found at the locus 142723134 on chromosome 1.
- SNP_name: The name of the SNP generated by the GDM.
- SNP_number: The number of the SNP generated by the GDM, corresponds to the number in the SNP_name.
- GDM_explained: The percentage of deviance explained by the model.
- Geographic: The absolute importance of geographic distance as a predictor in the GDM.
- Bio7: The absolute importance of the BIO7 (temperature annual range) bioclimatic variable in the GDM.
- Bio13: The absolute importance of the BIO13 (precipitation of the wettest month) bioclimatic variable in the GDM.
- Geographic_rel: The relative (rescaled) importance of geographic distance compared to other predictors.
- Bio7_rel: The relative (rescaled) importance of BIO7 among all predictors.
- Bio13_rel: The relative (rescaled) importance of BIO13 among all predictors.
- min_Fst: The minimum pairwise Fst value, indicating the lowest genetic differentiation between populations.
- max_Fst: The maximum pairwise Fst value, indicating the highest genetic differentiation between populations.
- Fst_range: The range of genetic differentiation (max_Fst - min_Fst) across populations.
- Modtest_geo_imp: The estimated importance of geographic distance from the permutation test (
gdm.varImp). - Modtest_bio7_imp: The estimated importance of BIO7 from the permutation test.
- Modtest_bio13_imp: The estimated importance of BIO13 from the permutation test.
- Modtest_geo_sig: The p-value for the significance of geographic distance as a predictor.
- Modtest_bio7_sig: The p-value for the significance of BIO7 as a predictor.
- Modtest_bio13_sig: The p-value for the significance of BIO13 as a predictor.
- CHROM: The chromosome (1-9) containing the SNP.
- POS: The locus of the SNP on the chromosome.
- SNP_number.y: The number of the SNP generated by the GDM, corresponds to the number in the SNP_name.
- biomart: The location of the gene containing the SNP. This is on the format of* Chromosome: Gene Start- Gene End*.
- Gene.stable.ID: The name of the *Lactuca serriola *gene containing the identified SNP.
- group: The portion of the precipitation gradient where each SNP shows the greatest turnover, either 37-120mm or 120-202mm of precipitation in the wettest month.
- Ortholog: The name of the corresponding* Arabidopsis thaliana* ortholog*. *
- Symbol: The symbol associated with the* Arabidopsis thaliana* ortholog.
- Score: Score in Bits for the BLAST of each* Lactuca serriola *gene and the Arabidopsis thaliana ortholog with the largest Score (Bits) or largest E-value based on significant alignments. Orthologs determined using BLASTN 2.9.0+, Araport11 coding sequences (DNA).
- Evalue: The E-value of the best* Arabidopsis thaliana* ortholog for the given* Lactuca serriola* gene.
File: Used_EAA_Samples.csv
Description: List of samples used and their population identifier.
Variables
- Prev_name: list of named samples
- Population: Assigned population identifier
Code/software
Code running
R is required to run all scripts. Below scripts were created using R version 4.4.2
- The code is intended to be run in the order of RDA, STRUCTURE map, LFMM, GDM, LD, GDM and again LFMM. GDM is initially calculated using linked SNPs, and can be repeated after the removal of linked SNPs. PRS calculation and map projections of PRS are included in the LFMM code as the effect sizes from LFMM are required for PRS calculation.
- Structure_map_Genomic_Signatures.R - Generates the map seen in Figure 1A.
- RDA_Van_Deurs_Genomic_Signatures.R - Used for RDA analysis and the generation of figure 1C.
- LFMM_Van_Deurs_Genomic_Signatures.R - Used to identify outlier SNPs, manhattan plot (Figure 2A), PRS calculation (Figure 4), and temporal projections of changes in PRS (Figure 5).
- GDM_Genomic_signatures.R - Performs GDM on SNPs identified by LFMM (Figure 2B,C)
- LD Genomic signatures.R - Removes SNP which show a high degree of linkage within the same chromosome.
Sequence data
- Sequence data for individual samples can be obtained from ENA repository under project PRJEB85796.
