Data from: Variation in genomic vulnerability to climate change across temperate populations of eelgrass (Zostera marina)
Cite this dataset
Jeffery, Nicholas et al. (2024). Data from: Variation in genomic vulnerability to climate change across temperate populations of eelgrass (Zostera marina) [Dataset]. Dryad. https://doi.org/10.5061/dryad.xpnvx0kp2
Abstract
A global decline in seagrass populations has led to renewed calls for their conservation as important providers of biogenic and foraging habitat, shoreline stabilisation, and carbon storage. Eelgrass (Zostera marina) occupies the largest geographic range among seagrass species spanning a commensurately broad spectrum of environmental conditions. In Canada, eelgrass is managed as a single phylogroup despite occurring across three oceans and a range of ocean temperatures and salinity gradients. Previous research has focused on applying relatively few markers to reveal population structure of eelgrass, whereas a whole genome approach is warranted to investigate cryptic structure among populations inhabiting different ocean basins and localized environmental conditions. We used a pooled whole-genome re-sequencing approach to characterise population structure, gene flow, and environmental associations of 23 eelgrass populations ranging from the Northeast United States, to Atlantic, subarctic, and Pacific Canada. We identified over 500,000 SNPs, which when mapped to a chromosome-level genome assembly revealed six broad clades of eelgrass across the study area, with pairwise FST ranging from 0 among neighbouring populations to 0.54 between Pacific and Atlantic coasts. Genetic diversity was highest in the Pacific and lowest in the subarctic, consistent with colonisation of the Arctic and Atlantic oceans from the Pacific less than 300 kya. Using redundancy analyses and two climate change projection scenarios, we found that subarctic populations are predicted to be more vulnerable to climate change through genomic offset predictions. Conservation planning in Canada should thus ensure that representative populations from each identified clade are included within a national network so that latent genetic diversity is protected, and gene flow is maintained. Northern populations, in particular, may require additional mitigation measures given their potential susceptibility to a rapidly changing climate.
README: Variation in genomic vulnerability to climate change across temperate populations of eelgrass (Zostera marina)
https://doi.org/10.5061/dryad.xpnvx0kp2
The data herein includes sample site metadata (GPS coordinates, collection dates, personnel, and site names and codes), environmental data used for genomic-environment association analyses (redundancy analyses), as well as pairwise Fst matrices for all sampling sites. All raw DNA sequences for all 23 sampling locations are contained within the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/sra/PRJNA891275)
Description of the data and file structure
The primary data in this project are the raw fastq DNA files deposited in NCBI for each of 23 populations sampled. As these fastq files represent pooled genomic DNA sequences per population, there are no individual genotypes, but rather can be processed into population level allele frequencies. Population metadata is deposited here for processing the raw reads into SNPs. All code available on Github can be used to process the raw fastq files in NCBI and reproduce our analyses with the files contained here.
Environmental data based on in situ measurements (for some coastal Nova Scotia sites), as well as modeled environmental data (sea surface and bottom temperatures and salinities) is available for the study range (except the Pacific). These modeled data are based on the Bedford Institute of Oceanography North Atlantic Model (BNAM; Wang et al. 2018). The modeled data are available for the present climate, as well as modeled future climate under different carbon emissions scenarios (RCP 4.5 and 8.5). These projected model values were used to estimate genomic offset to the year 2075. Relative wave exposure index (REI) was extracted for all sites from:
O'Brien, John M; Wong, Melisa C.; Stanley, Ryan R.E. (2022). A relative wave exposure index for the coastal zone of the Scotian Shelf-Bay of Fundy Bioregion. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.5433567
The data files uploaded here include:
- eelgrass_zone.zip
* A zipped shapefile (.shp) that extends 20km from the coast of eastern North America to mask genomic offset predictions to the coastal area where eelgrass is most likely to occur
2. ReorderedSiteCoords.csv
* Contains GPS coordinates and site codes for each sample location in the study
* "Grouping" is the broad ocean basin these sites are located in (Atlantic, Arctic, and Pacific)
* Abbreviations:
* Region: NS = Nova Scotia, PEI = Prince Edward Island, QB = Quebec, JB = James Bay, NB = New Brunswick, BC= British Columbia
3. GDD_max_Jun15_to_Sept15_Apr30_2022.csv
* Contains growing degree day (GDD) data for seven locations across Nova Scotia. This is in situ temperature data recorded by the authors, as opposed to modeled data
* GDD5 = the number of cumulative growing days above 5 degrees Celsius
* GDD11 = the number of cumulative growing days above 11 degrees Celsius
4. EelgrassFutureClimateRAW.csv
* provides the environmental data for across the Atlantic and subarctic oceans, used for redundancy analyses and estimating genomic offset in the present study. Included are annual and seasonal maxima, minima, and means per environmental variable for each site, as well as for the present day and projected future conditions to 2075
* the "value" column includes the actual value for that variable, measured in parts per thousand (salinity) or degrees Celsius (temperature)
* Abbreviations:
* Code: the site codes provided in the ReorderedSiteCoords.csv file
* rcp = Residual Concentration Pathway, with values of RCP 4.5 and RCP 8.5. The former represents a carbon emissions scenario that is curtailed by 2045, while RCP 8.5 is a carbon emissions "business as usual" scenario and is thus more extreme.
* Parameters: Sbtm = Bottom salinity, SSS = sea surface salinity, SST = sea surface temperature, Tbtm = bottom temperature
5. Eelgrass_FutureEnv_PerSite.csv
* A summarized version of the EelgrassFutureClimatRAW.csv file
* includes site coordinates, and selected environmental variables to be used in redundancy analyses and genomic offset
* Abbreviations:
* TBTM_WinterMin = minimum seasonal winter bottom temperature
* SST_WinterMin = minimum seasonal winter surface temperature
* SBTM_AnnMean = annual mean bottom salinity
* SSS_AnnMean = annual mean surface salinity
* SST_SpringMean = mean seasonal spring sea surface temperature
* SST_SummerMax = maximum seasonal summer sea surface temperature
* TBTM_FallMin = minimum seasonal fall bottom temperature
* "45" and "85" in the column headers refers to the projected values to 2075 under RCP 4.5 and RCP 8.5 respectively
6. PairwiseFST.NoTSW.csv
* Pairwise Fst values among all populations (a 22 x 22 matrix) with the exception of TSW (Tsawwassen, British Columbia) due to its extreme genetic differentiation from all other sites
* Site codes are the same as ReorderedSiteCoords.csv
7. Zostera_PairwisePoolsFST.csv
* Pairwise Fst values among all populations in the study (a 23 by 23 matrix)
* Site codes are the same as ReorderedSiteCoords.csv
8. Temperature_summary_Jun15_to_Sept15_2017-2021.csv
* Temperature data measured in situ using Hobo temperature loggers between June 15 to September 15 in 2017 through 2021. This time period represents the optimal growth period for Zostera marina in Nova Scotia.
* Abbreviations:
* Meadian_temp: the median recorded temperature in degrees Celsius for the time period
* Mean_temp: the mean recorded temperature in degrees Celsius for the time period
* SD_temp: the standard deviation of temperature in degrees Celsius for the time period
* Max_temp: the maximum recorded temperature in degrees Celsius for the time period
* Min_temp: the minimum recorded temperature in degrees Celsius for the time period
* ninetyp: the 90th percentile temperature in degrees Celsius for the time period, indicative high temperatures, but not the maximum
* prop5.23: the proportion of time recorded between 5 and 23 degrees Celsius, which represents the temperature range where Zostera marina typically grows in Nova Scotia
* meansumDTR95per: mean daily temperature range calculated for the summer months, using 5 and 95 percentiles for data smoothing
Sharing/Access information
All data is deposited in the Dryad Digital Repository, except in cases where the files are too large (see below for raw DNA fastq files). All other data is available from the corresponding author upon request.
- Raw DNA sequences are deposited at https://www.ncbi.nlm.nih.gov/sra/PRJNA891275
Data was derived from the following sources:
- Eelgrass shoots collected from the northeastern United States, Atlantic Canada, James Bay, and British Columbia, Canada
- Environmental data is derived from in situ temperature measurements and relative wave exposure indices from M. Wong (DFO), as well as modeled current and future projected seasonal temperature and annual salinity values for the study area from the Bedford Institute of Oceanography North Atlantic Model (BNAM)
Code/Software
All bioinformatics and analytical code were used in freely available software, including the following:
- FastQC - Quality assessment of raw fastq files
- Bwa-mem - Aligning reads to the reference genome for Zostera marina
- Picard Tools - attach read group headers to bam files generated from bwa and samtools, and mark duplicate reads for removal
- Genome Analysis Tool Kit (GATK) - remove duplicate reads and realign indels to call SNPs
- Popoolation2 - analyze a 'sync' file (generated from a samtools mpileup file) to conduct basic statistics on poolseq data, including Fst and nucleotide diversity
- BayPass - generate a covariance matrix and clustering tree from SNP allele frequencies per population to show relationships based on genetic dissimilarity
- Treemix - test migration events and the influence of genetic drift on generating a maximum likelihood tree of the eelgrass populations
- R and its associated packages (poolfstat, pcadapt, vegan, and others) - generate a PCA biplot of allele frequencies, and conduct redundancy analyses of allele frequencies and environmental covariates
All code for bioinformatics processing and analyses are included in the open Github repository https://github.com/NickJeff13/Eelgrass_Poolseq
Methods
We generated allele frequencies for 23 Zostera marina populations across North America using a pooled whole-genome sequencing approach (poolseq). Individual shoots of eelgrass were collected from plants at least 2 metres apart in the field, to minimize the potential presence of clones in the data. Genomic DNA was extracted from all individuals and pooled at the population level for sequencing on an Illumina NovaSeq platform at Genome Quebec (Canada), and SNPs were called following the GATK pipeline. Analyses were conducted with Popoolation2 and R Studio. R Studio was used for all genomic-environmental association analyses, including redundancy analyses, and calculating genomic offset sensu Capblancq and Forester (2021).
Funding
Fisheries and Oceans Canada