Adaptation in a keystone grazer under novel predation pressure
Data files
Dec 06, 2024 version files 442.94 MB
-
Admixture_data.zip
100.54 MB
-
Core_profile_data.zip
80.94 KB
-
Coverage_model_data.zip
7.56 KB
-
Diversity_metrics_data.zip
2.11 MB
-
Lake_chemistry_data.zip
6.94 KB
-
Outliers_heatmap_data.zip
1.34 KB
-
PCA_Mantel_files.zip
43.28 KB
-
Population_Genetics_Code.txt
25.01 KB
-
R_Code.txt
34.48 KB
-
README.md
15.84 KB
-
Size_model_data.zip
7.12 KB
-
SNP_data.zip
340.07 MB
Abstract
Understanding how species adapt to environmental change is necessary to protect biodiversity and ecosystem services. Growing evidence suggests species can adapt rapidly to novel selection pressures like predation from invasive species, but the repeatability and predictability of selection remain poorly understood in wild populations. We tested how a keystone aquatic herbivore, Daphnia pulicaria, evolved in response to predation pressure by the introduced zooplanktivore Bythotrephes longimanus. Using high-resolution 210Pb-dated sediment cores from 12 lakes in Ontario (Canada), which primarily differed in invasion status by Bythotrephes, we compared Daphnia population genetic structure over time using whole-genome sequencing of individual resting embryos. We found strong genetic differentiation between populations ca. 70 years before versus 30 years after Bythotrephes invasion, with no difference over this period in uninvaded lakes. Compared with uninvaded lakes, we identified, on average, 64-times more loci were putatively under selection in the invaded lakes. Differentiated loci were mainly associated with known reproductive and stress responses, and mean body size consistently increased by 14.1% over time in invaded lakes. These results suggest Daphnia populations were repeatedly acquiring heritable genetic adaptations to escape gape-limited predation. More generally, our results suggest some aspects of environmental change predictably shape genome evolution.
README: Adaptation in a keystone grazer under novel predation pressure
https://doi.org/10.5061/dryad.0k6djhb82
Data Summary
This dataset contains data files and code to replicate and visualise the results of the population genetic structure (principal coordinate analysis; genetic admixture) and genetic diversity (pairwise genetic differentiation - FST; nucleotide diversity) analyses, based on whole genome data of 97 individual Daphnia pulicaria resting embryos, sequenced on an Illumina NovaSeq platform. It also contains data and code for the resting embryo size (ephippial size) and genomic sequencing coverage models. Additional data files and code used to generate supplementary figures and tables associated with the same manuscript, include lake water chemistry survey data used to select sampling sites (2006), updated water chemistry profiles from the time of sampling (2021), plankton sample counts, sediment core isotope profiles used in dating with 210Pb as well as sediment accumulation rates, calendar date ranges and error estimates.
All specimens were extracted from dated lake sediment cores across 12 lakes in Ontario, Canada (Bonnie, Buck, Crown, Fletcher, Grandview, Leech, Longline, Otter, Walker, Solitaire, Clinto, Bigwind), out of which 9 lakes contained material of sufficient quality for DNA extraction and sequencing (Bonnie, Buck, Crown, Fletcher, Grandview, Leech, Longline, Otter, Walker) and are therefore included in the majority of the analyses described in this dataset. Solitaire, Clinto and Bigwind are not included in any population genetic analyses as there are no genomic data available. Lakes differed on whether they have been invaded by the predator Bythotrephes longimanus (Crown, Fletcher, Grandview, Otter, Leech, Walker, Clinto) or if they remain uninvaded controls (Bonnie, Buck, Bigwind, Longline, Solitaire). We compared resting embryo genotypes between 9 lakes, between invaded versus control lakes and between sub-populations within each lake where sub-populations are embryos extracted from the bottoms and tops of dated sediment cores, corresponding to time periods before and after the introduction of Bythotrephes, respectively.
List of Data Folders and File Descriptions
SNP_data.zip Folder contains 10 files in standard Variant Calling Format (vcf), commonly used text file type to store genetic sequence variation data.
Individuals within VCF files are named as follows: (3 first letters of lake name) (core depth range in cm)(unique letter). For example: 'Bon0204A' corresponds to individual embryo A, extracted from the 02-04 cm core section in Bonnie Lake.
One file (All_lakes_TB_F2snps.vcf) contains the combined and complete genomic dataset across all 9 lakes and temporal sub-populations (top and bottom – after and before the introduction of the invasive predator). The other 9 files are the individual datasets for each of the 9 study lakes (LakeName_filtered_snps.vcf) containing normalised, quality filtered Single Nucleotide Polymorphism (SNP) datasets used in all analyses.
Diversity_data.zip Folder contains the following subfolders and files:
- Diversity_metrics_data Pairwise genetic differentiation (FST) and nucleotide diversity (pi) across the genome, calculated over 5 kb sliding windows at 1kb steps with individuals grouped per lake (Section 5 in Population_Genetics_Code.txt) using the vcf files from (1) SNP_data.zip.
Scaffold names for sliding windows in FST and nucleotide diversity files correspond to those of the latest Daphnia pulicaria annotated genome (Jan 2022) used to align all sequences (Section 1 in Population_Genetics_Code.txt).
Folder includes the following files for each lake:
o Lakename_X_5kb.fst (A text file list of available window positions and FST values – used as input for further analyses - Sections 1 and 7 in R_Code.txt) Lakename_5kb.windowed.pi (A text file list of available window positions and pi values - used as input for further analyses – Section 6 in R_Code.txt)
o TopBottom_5kb_pi_all.csv (A comma-delimited variable file used as input to create a boxplot comparing nucleotide diversity between embryos from lake core tops and bottoms – Section 6 in R_Code.txt)
o Pi_model_TB_windows.csv (A comma-delimited variable file used as input to fit a linear mixed effects model comparing nucleotide diversity between populations from lakes invaded by Bythotrephes and uninvaded controls. – Section 6 in R_Code.txt)
- FST_pop_assignment_files Sub-folder with two text files per lake (Bon, Buc, Cro, Fle, Gra, Lee, Lon, Ott, Wal). Simple lists to assign individual genomes to top and bottom sub-populations for pairwise genetic differentiation (FST) calculations (Section 5 in Population_Genetics_Code.txt).
Admixture_data.zip Folder includes the following file types for each study lake:
o Lakename_admix.beagle.gz (A zipped file containing genotype likelihoods estimated from the variant calling files in beagle format used as input for the genetic admixture analysis (Sections 3-4 in Population_Genetics_Code.txt)
o Lakename_clumppK3.qopt (A text file containing the final genetic admixture matrix of individuals used as input to create a genetic structure plot between embryos from lake core tops and bottoms – Section 4 in R_Code.txt)
o Lakename_Core.tsv (Core section metadata in tab-separated value (tsv) text format used together with the qopt matrix for visualisation– Section 4 in R_Code.txt)
PCA_Mantel_files.zip Folder includes the following input files used in Principal Coordinate Analysis and visualisation and the Mantel test of isolation by distance (Sections 2-3 in R_Code.txt):
o Samplelist.csv (A list of individual sample IDs and their lake of origin in a comma-separated value file)
o LakeDataForPCA.mdist (A text file containing a genetic distance matrix of all individual embryos used as initial input for the Principal Coordinate Analysis – Section 2 in R_Code.txt)
o eig.final.TB.csv (A comma-separated value file with the results of the PCA analysis for visualisation - Section 2 in R_Code.txt). Sections (sec) refer to core sections where T=Top and B=Bottom, Groups refer to Lake+Section (e.g. BoT is Bonnie Lake Top) followed by lake ID, individual ID (IID) and eigenvectors x, y)
o Genetic_Distance.csv (A comma-separated value file with a simplified genetic distance matrix used a partial input for a Mantel test - Section 3 in R_Code.txt)
o Lake_Coordinates.csv (A comma-separated value file with lake coordinates to calculate the physical geographical distance between study sites, used a partial input for a Mantel test - Section 3 in R_Code.txt)
Size_model_data.zip Folder containing the following data files in comma-separated value format used as input for Section 9 in R_Code.txt
o Size_TB_model.csv (Linear mixed effects model input data with information on lake of origin (lake), invasion status where 0= uninvaded control, 1 = invaded, core section (depth = top/bottom) and length = individual measurements of ephippial length. We only measured the full length of intact ephippia along the dorsal ridge as a proxy for maternal body size)
o Size_plot_data.csv (Input data to generate figure with ephippial size changes per lake and model predictions. lake, invasion status and core depth as above, length = mean ephippial length for the corresponding section followed by standard deviation (sd), sample size (N) and standard error (se). Modeli = Model prediction for invaded lakes and Modelc= Model prediction for control lakes)
Coverage_model_data.zip Folder containing the following data files in comma-separated value format used as input for Section 5 in R_Code.txt
o coverage_model.csv (Linear mixed effects model input data with information on lake of origin (lake), invasion status where 0= uninvaded control, 1 = invaded, core section where 1= top, 2 =bottom, chrom = chromosome number 1 to 12 and cover = mean sequencing coverage for each chromosome)
o coverage_barplot.csv (Input data for stacked bar plot figures to visualise sequencing coverage per chromosome for each lake and core section (T=top and B=bottom). Sequencing coverage statistics were obtained directly during the DNA sequence alignment stage – Section 1 in Population_Genetics_Code.txt)
Outliers_heatmap_data.zip Folder containing the following data files in comma-separated value format used as input for the heatmap figures Section 8 inR_Code.txt
o BayS_TB_outliers.csv (Number of outlier loci per chromosome (chrom) in each lake identified with BayeScan v.2.1 (Foll and Gaggiotti, 2008). This software is freely available via the following link: BayeScan
o heatmap_outliers.csv (Number of windows with FST outliers per chromosome (chrom) in each lake – Section 7 in R_Code.txt)
o heatmap_chroms.csv (Chromosome length data)
Core_profile_data.zip Folder contains the following subfolders and files, providing input data for supplementary figures - Sections 12-13 in R_Code.txt)
- Isotope_profiles Sub-folder with an individual csv file for each lake (Lakename.csv) listing core depth at the point of measurement, isotope activity in Bq per kg and standard error (err).
- Sedimentation_profiles Sub-folder with an individual csv file for each lake (Lake profile.csv) listing core depth at the point of measurement, calendar date based on 210Pb isotope activity and assuming constant rate of sedimentation, age in years and model error (age.er), sedimentation (sed) and standard error (sed.er), accumulation rate (acc) and standard error (acc.er)
o 210Pb_dates_all_cores.xlsx (Spreadsheet summary of dating data with 210Pb, including estimated calendar date ranges for each sediment core section)
o Spine_density_calculation.xlsx (Spreadsheet summary and breakdown of density calculation of Bythotrephes longimanus spines extracted from the sediment of invaded study lakes)
o Spine_densities.csv (Densities of Bythotrephes longimanus spines expressed in number of spines per gram of dried sediment)
o Ephippia_densities.csv (Densities of Daphnia pulicaria ephippia extracted from the sediment, expressed in number of ephippia per gram of dried sediment)
Lake_chemistry_data.zip
- This folder contains a csv file 2006_lake_chem.csv with monitoring information from a 2006 survey (lake maximum depth, presence/absence of the invasive Bythotrephes, water pH, water calcium-Ca, dissolved organic carbon-DOC and total phosphorus-TP) used in sampling site selection and provided by the Ontario Ministry for the Environment, Conservation and Parks (Section 10 in R_Code.txt)
- The sub-folder 2021 contains lake water chemistry data collected at the time of sampling in October-November 2021. There is an individual csv file for each chemical parameter comparing lakes and lake groups (control/invaded) (Section 11 in R_Code.txt). The chemical parameters are measured in the following units: pH (numerical scale 0-14), Dissolved Organic Carbon - DOC, Dissolved Inorganic Carbon - DIC, Calcium - Ca, Sodium - Na, Chlorine - Cl, Iron - Fe, Potassium - K, Zinc - Zn, Magnesium - Mg, Sulfate- SO4, Silicon dioxide - SiO2, Total Nitrogen - TN, Total Phosphorus - TP, Ammonium - NH4 and Nitrite nitrate - NO2 NO3 (mg per litre of water)
Code/Software
All code associated with the datasets described above is submitted in two separate text files. Annotations are available throughout the scripts regarding 1) library and tool loading, 2) dataset loading and preparation, 3) analyses, and 4) base figure creation.
For analyses performed directly through the command line, annotated code is provided in the (1) Population_Genetics_Code.txt file with a complete list of necessary tools and versions used in this publication. For analyses and base plot visualization performed in R v.4.2.1 (R Core Team 2022), annotated code is provided in the (2) R_code.txt file, with a list of necessary R packages in each section.
Population_Genetics_Code.txt Annotated code and complete list of tools covering the following sections:
1) processing and cleaning raw Illumina sequencing data of individual Daphnia pulicaria embryos, alignment and mapping of sequencing to the reference genome
2) identifying and quality filtering of genetic variants (Single Nucleotide Polymorphisms - SNPs)
3) calculating genotype likelihoods for low coverage individual resting embryo genomes
4) analysis of population genetic admixture based on genotype likelihoods
5) calculation of common genetic diversity metrics: fixation index (FST -measure of genetic differentiation) and nucleotide diversity (π – measure of genetic polymorphism).
R_code.txt Annotated R code including lists of all necessary packages covering the following sections:
1) calculate pairwise genetic differentiation (FSTp) between core top and bottom sub-populations
2) perform and plot a Principal Component Analysis (PCA) to visualise and compare the genetic distances between different lake populations and sub-populations within each lake
3) perform a Mantel test of isolation by distance using both genetic and geographical distance matrices to check for differentiation of populations based on geographical distance between the study lakes
4) visualise and plot the results of the genetic admixture analysis for each lake
5) fit the linear mixed effects model to check for sequencing coverage bias and plot the model predictions. Create a bar plot of coverage statistics per chromosome for each lake population
6) fit the linear mixed effects model to check for nucleotide diversity differences between control lakes and lakes invaded by Bythotrephes. Generate a boxplot comparing core top and bottom sub-populations within each lake.
7) estimate the genome wide threshold of elevated FST values and plot the distribution of highly differentiated genetic loci across the genome with separate panels per chromosome
8) generate a heatmap figure of differentiated genetic loci with a separate bar for chromosome length
9) fit the linear mixed effects model investigating changes in ephippial size in lakes invaded by Bythotrephes and uninvaded controls. Plot the model predictions together with the actual changes in mean size.
10) perform and plot a Principal Component Analysis (PCA) to visualise and compare key parameters between different lakes with available monitoring data (lake maximum depth, presence/absence of the invasive Bythotrephes, water pH, water calcium-Ca, dissolved organic carbon-DOC and total phosphorus-TP).
11) compare water chemistry data collected at the time of sampling (2021) using nonparametric Wilcoxon rank-sum tests to check for statistically significant differences in water chemistry variables between invaded and control lakes
12) plot the isotope activities with associated error and the inferred sediment accumulation rates from the analysis of lake sediment cores during dating with 210Pb
13) plot the plankton sample densities extracted from the sediment cores (number of Daphnia pulicaria ephippia/resting embryos and number of Bythotrephes remains)
Other links
- Daphnia pulicaria reference genome: Daphnia pulicaria genome assembly SC_F0-13Bv2 - NCBI - NLM (nih.gov)
- Daphnia pulicaria Ensembl Genome Browser: Chr1: 1-43.50M - Genome Data Viewer - NCBI (nih.gov)