Data from: Intraspecific diversity is critical to population-level risk assessments
Data files
Feb 26, 2025 version files 98.74 KB
-
DaphBenchtopSNPmatrixDifferences.csv
2.09 KB
-
Daphnia_Bootstrap_230826.R
25.60 KB
-
DaphniaBenchtop_FunctionSNPDifferences.csv
1.52 KB
-
DaphniaBenchtop_MCGeneVariantCounts_230817.csv
930 B
-
DaphniaGrow_ResponseMeans_210407.csv
3.61 KB
-
DaphniaRiskAssessment_RCode_240606.R
22.48 KB
-
DMOrderLetters.csv
3.58 KB
-
GenomicPCAOutput.csv
5.10 KB
-
Neonates_210226.csv
14.34 KB
-
README.md
5.32 KB
-
Survival_210129.csv
14.17 KB
Apr 22, 2025 version files 1.38 GB
-
DaphBenchtopSNPmatrixDifferences.csv
2.09 KB
-
Daphnia_Bootstrap_230826.R
25.60 KB
-
DaphniaBenchtop_FunctionSNPDifferences.csv
1.52 KB
-
DaphniaBenchtop_MCGeneVariantCounts_230817.csv
930 B
-
DaphniaGrow_ResponseMeans_210407.csv
3.61 KB
-
DaphniaRiskAssessment_RCode_240606.R
22.48 KB
-
DMOrderLetters.csv
3.58 KB
-
GenomicPCAOutput.csv
5.10 KB
-
Neonates_210226.csv
14.34 KB
-
README.md
6.66 KB
-
Shahmohamadlooetal_2024_daphnia_geneticvariance_index
109.38 KB
-
Shahmohamadlooetal_2024_daphnia_geneticvariance.vcf.gz
1.38 GB
-
Survival_210129.csv
14.17 KB
Abstract
Environmental risk assessment (ERA) is a critical tool for protecting life and its effectiveness is predicated on predicting how natural populations respond to contaminants. Yet, routine toxicity testing typically examines only one genotype from surrogate species, which may render risk assessments inaccurate as populations are most often composed of genetically distinct individuals. To determine the importance of intraspecific variation in the translation of toxicity testing to populations, we quantified the magnitude of phenotypic variation within 20 Daphnia magna clones derived from one lake. We repeated these assays across two exposure levels of microcystins, a cosmopolitan and lethal aquatic contaminant produced by harmful algal blooms. We found considerable intraspecific genetic variation in survival, growth, and reproduction, which was amplified by microcystins exposure. Using simulations, we demonstrate that the common practice of employing a single genotype to calculate toxicity tolerance failed to produce an estimate within the 95% confidence interval over half of the time. Finally, we conducted whole genome sequencing of all 20 clones to test whether differences in toxicological responses were associated with overall genomic divergence or divergence at candidate loci based on prior gene expression work. We find no overall correlations, suggesting that clonal variation, but not variation at candidate genes, is an important predictor of population-level responses to toxic insults. These results illuminate the importance of incorporating overall intraspecific genetic variation, without focusing specifically on variation in candidate genes, into ERAs to reliably predict how natural populations will respond to contaminants.
This dataset is associated with the manuscript: Intraspecific diversity is critical to population-level risk assessments
DOI: https://doi.org/10.5061/dryad.5tb2rbpck
Change log
April 22, 2025: In this version, the authors have uploaded the variant call format (VCF) file along with its associated index file that allows users to download the full set of variant sites identified across the whole genomes of all 20 Daphnia magna clones used in the study.
General Description
This dataset contains the raw and processed data, the variant call format (VCF) file, as well as the R scripts necessary to reproduce the statistical analyses and figures presented in the manuscript. The data pertain to phenotypic and genomic variation in Daphnia, analyzed under different experimental treatments.
File Structure
The dataset includes:
- Eight CSV files containing phenotypic and genomic data
- Two R script files used for statistical analyses and figure generation
- One VCF file storing the full set of variant sites identified across the whole genomes of all 20 Daphnia magna clones used in the study
- One VCF index file included alongside the compressed VCF file to enable rapid access to variant data by genomic position, facilitating efficient downstream analyses
- This README file describing the dataset and its contents
Data Files and Descriptions
Each CSV file is structured in a tabular format with proper headings, including relevant variables such as clone ID, treatment conditions, and measured phenotypes. Below is a detailed breakdown of each file:
Phenotypic Data
- Survival_210129.csv
- Contains survival data across experimental treatments.
- Variables:
Clone: Unique identifier for each cloneTreatment: Experimental condition appliedReplicate: Replicate numberDay X: Survival on Day 0, 7, and 14 (1 = alive, 0 = dead)
- Neonates_210226.csv
- Contains reproductive output data.
- Variables:
Clone: Unique identifier for each cloneTreatment: Experimental condition appliedReplicate: Replicate numberNeonates: Number of offspring produced (unit: count)Fbrood: Number of neonates produced in the first brood recorded per replicateTbroods: Total number of broods recorded throughout the entire experiment per replicate- Missing Values: Some values in
Fbrood,Tbroods, andneonatescolumns are missing. These reflect cases where Daphnia mothers never reproduced or other constraints (e.g., the Daphnia replicate died).
- DaphniaGrow_ResponseMeans_210407.csv
- Summarized response means for various phenotypic traits.
- Variables:
Clone: Unique identifier for each cloneTreatment: Experimental condition appliedTrait: Phenotypic trait measuredMean_Response: Mean value of the measured trait (e.g., mean survival on Day 7, mean length on Day 14, etc.)
Genomic Data
- DaphniaBenchtop_FunctionSNPDifferences.csv
- Contains SNP functional annotations and their differences among groups.
- Variables:
Clone: Unique identifier for each clone- Missing Values: Some SNPs have missing effect annotations, indicating regions where functional effects were not determined.
- DaphniaBenchtop_MCGeneVariantCounts_230817.csv
- Contains gene-level variant counts from sequencing data.
- Variables:
ID: Unique sample identifierTotalVariant: Total number of variants detected in the genenRefHom: Number of homozygous reference alleles in the samplenNonRefHom: Number of homozygous non-reference alleles in the samplenHets: Number of heterozygous alleles in the samplenIndels: Number of insertions and deletions detected in the sample
- DaphBenchtopSNPmatrixDifferences.csv
- SNP matrix indicating presence/absence or frequency differences across experimental groups.
- Variables:
Clone: Identifier for each sample- Missing Values: Some SNPs have missing allele counts, likely due to sequencing depth limitations.
- GenomicPCAOutput.csv
- Principal component analysis (PCA) results from genomic data.
- Variables:
Clone: Unique identifier for each clonePC1,PC2,PC3: Principal component scores
- DMOrderLetters.csv
- Statistical comparison output, containing grouping information from post-hoc tests.
- Variables:
Comparison: Pairwise comparison identifierOrder_Letter: Letter groupings from statistical analysis
Code Files
- DaphniaRiskAssessment_RCode_240606.R
- This script performs all statistical analyses and generates figures found in the manuscript.
- Dependencies: Requires all CSV files for data input.
- Daphnia_Bootstrap_230826.R
- This script conducts bootstrap resampling analyses for certain statistical comparisons.
- Dependencies: Uses phenotypic data CSV files.
Variant Call Format Files
- Shahmohamadlooetal_2024_daphnia_geneticvariance.vcf.gz
- This file contains the full set of variant sites identified across the whole genomes of all 20 *Daphnia magna *clones used in the study, enabling population-level comparisons of genomic variation and facilitating genotype–phenotype association analyses under toxin exposure.
- Dependencies: All VCF files for data input.
- Shahmohamadlooetal_2024_daphnia_geneticvariance_index
- This file is included alongside the compressed VCF file to enable rapid access to variant data by genomic position, facilitating efficient downstream analyses.
- Dependencies: All VCF files for data input.
Software and Reproducibility
- All statistical analyses were performed using R version 4.2.2.
- Required R packages: (Ensure these are installed before running scripts)
ggplot2for visualizationlme4for mixed modelsdplyrfor data manipulationveganfor multivariate analyses
- Ensure all CSV files are in the working directory before running the scripts.
Handling of Missing Data
Some variables in the dataset contain missing values due to experimental constraints or data collection limitations. These missing values have been left as empty cells to ensure compatibility with analysis scripts. Users should handle these appropriately during their analyses. The specific files and columns with missing values are documented in the file descriptions above.
- Shahmohamadloo, René S.; Rudman, Seth M.; Clare, Catherine I. et al. (2023). Intraspecific genetic variation is critical to robust toxicological predictions of aquatic contaminants [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2023.06.06.543817
- Shahmohamadloo, René S.; Rudman, Seth M.; Clare, Catherine I. et al. (2024). Intraspecific diversity is critical to population-level risk assessments. Scientific Reports. https://doi.org/10.1038/s41598-024-76734-x
