Data from: Comparative analysis of environmental DNA metabarcoding and spectro-fluorescence for phytoplankton community assessments
Data files
Oct 28, 2025 version files 39.08 MB
-
ALA1_averages.csv
75.31 KB
-
asv_table_eDNA1_16S_noNEG_vs.csv
23.01 MB
-
asv_table_eDNA1_18S_pr2_mesocosms_vs.csv
13.39 MB
-
data_analysis_ALAmethods_eDNA.R
64.75 KB
-
mapping_eDNA1_16S.csv
5.14 KB
-
mapping_eDNA1_18S.csv
5.83 KB
-
README.md
6.56 KB
-
sample_data_eDNA1_16S_noNEG_vs.csv
5.81 KB
-
sample_data_eDNA1_18S_pr2_mesocosms_vs.csv
7.62 KB
-
sequence_processing_16S.R
9.75 KB
-
sequence_processing_18S.R
16.30 KB
-
tax_table_eDNA1_16S_noNEG_vs.csv
1.43 MB
-
tax_table_eDNA1_18S_pr2_mesocosms_vs.csv
1.04 MB
-
temp_weeks_39.csv
13.58 KB
Abstract
Quantifications of phytoplankton biomass and species composition are crucial for monitoring biodiversity and population dynamics in aquatic environments, and both direct microscopic counts and fluorescence-based methods have been widely used for monitoring. Recent advancements in DNA metabarcoding offer an alternative way of easily assessing diversity and species composition. However, a comprehensive comparison of the relative merits and limitations of DNA- and fluorescence-based methods is currently lacking. Here, we compare phytoplankton community composition measured via fluorescence and DNA metabarcoding in an outdoor, replicated mesocosm experiment. We show that there is a positive correlation between fluorescence-measured biomass and DNA read and amplicon sequence variants (ASV) numbers for cyanobacteria, but either weak or no correlation for the other phytoplankton groups assessed (cryptophytes, chromophytes, green algae). In addition, DNA metabarcoding was systematically better at detecting cryptophytes, which were rarely detected via fluorescence. Hence, while DNA metabarcoding may not provide reliable biomass estimates for the majority of phytoplankton groups, metabarcoding analysis offers higher taxonomic resolution and the capability to detect rare phytoplankton groups. Overall, our findings provide new insights into the strengths and limitations of each method and highlight the considerable potential and importance of including DNA barcoding in freshwater ecosystem assessment and biomonitoring programmes with a focus on biodiversity assessments.
Dataset DOI: 10.5061/dryad.jwstqjqn1
Description of the data and file structure
Datasets and R code for the manuscript titled "Comparative analysis of environmental DNA metabarcoding and spectro-fluorescence for phytoplankton community assessments" by Romana Salis and Lars-Anders Hansson. DOI: 10.1002/edn3.70097
We conducted a comparative analysis of phytoplankton community composition using fluorescence-based measurements and DNA metabarcoding in a controlled mesocosm experiment.
Files and variables
File: mapping_eDNA1_16S.csv and mapping_eDNA1_18S.csv
Description: sample data for pre-processing of the 16S and 18S sequence data.
Variables
- Sample date: Date the sample was taken
- sampling week: Week of the experiment the sample was taken
- treatment: Experimental treatment (heated and/or invasion)
- Heated: Y/N if the mesocosm was exposed to the heat treatment
- Invasion: Y/N if the mesocosm was exposed to the invasion treatment
- Mesocosm: Identity of the mesocosm the sample was taken from
- Sample: Sample identity (sampling week and mesocosm number)
- rep: replicate (1-6)
File: sample_data_eDNA1_16S_noNEG_vs.csv and sample_data_eDNA1_18S_pr2_mesocosms_vs.csv
Description: sample data for the analyses of the 16S and 18S data sets, with samples as rows.
Variables
- X: Sample ID
- Sample.date: Date the sample was taken
- sampling.week: Week of the experiment the sample was taken
- treatment: Experimental treatment (heated and/or invasion)
- Heated: Y/N if the mesocosm was exposed to the heat treatment
- Invasion: Y/N if the mesocosm was exposed to the invasion treatment
- Mesocosm: Identity of the mesocosm the sample was taken from
- Sample: Sample ID
- rep: Replicate 1-6
- sampling.occasion: Sampling occassion 1-5
File: tax_table_eDNA1_18S_pr2_mesocosms_vs.csv
Description: Taxonomy table for the 18S dataset, with the Taxonomy (columns) for each ASV (rows)
Variables
- ASV: ASV identity / number
- Columns: Kingdom, Supergroup, Division, Class, Order, Family, Genus, Species
File: tax_table_eDNA1_16S_noNEG_vs.csv
Description:
Variables
- ASV: ASV identity / number
- Columns: Kingdom, Phylum, Class, Order, Family, Genus, Species
File: asv_table_eDNA1_16S_noNEG_vs.csv
Description: ASV table for the 16S dataset, with normalised counts of ASVs (rows) in the samples (columns).
Variables
- ASV: ASV identity / number
- Columns: Samples wk0-m10 - wk22-m9
File: asv_table_eDNA1_18S_pr2_mesocosms_vs.csv
Description: ASV table for the 18S dataset, with normalised counts of ASVs (rows) in the samples (columns).
Variables
- ASV: ASV identity / number
- Columns: Samples E-wk0-m10 - E-wk22-m9
File: temp_weeks_39.csv
Description: Weekly average temperature measurement data from the mesocosms.
Variables
- sampling.week: Week of the experiment the measurement was taken
- AveTempC: Average temperature in the non-heated mesocosms
- AveTempH: Average temperature in the heated mesocosms
- Ave_tempdiff: Average temperature difference between heated and non-heated mesocosms
- DegreeWeeks: Number of degree weeks
- Ave_tempdiff_R: Average temperature difference between heated and non-heated mesocosms
- M2 - M27: Temperatures in the mesocosms (degree C)
File: ALA1_averages.csv
Description: Algal Lab Analyser data (averages of technical triplicates).
Variables
- Sample.date: Date the sample was taken
- sampling.week: Week of the experiment the measurement was taken
- treatment: Experimental treatment (heated and/or invasion)
- Heated: Y/N if the mesocosm was exposed to the heat treatment
- Invasion: Y/N if the mesocosm was exposed to the invasion treatment
- Mesocosm: Identity of the mesocosm the sample was taken from
- rep: Replicate (1-6)
- A_TotalAlgae: Total algal chlorophyll a concentration (μg/L)
- A_Green: Green algae concentration (μg/L)
- A_Cyanobacteria: Cyanobacteria concentration (μg/L)
- A_Chromophytes: Chromophyte concentration (μg/L)
- A_Cryptophytes: Cryptophyte concentration (μg/L)
Code/software
R code for pre-processing of sequence data: sequence_processing_16S.R and sequence_processing_18S.R
Sequence processing was performed using R 4.0.3.
The DADA2 pipeline (v 1.22.0) was used to infer amplicon sequence variants (ASVs) based on error models to correct sequencing errors while accounting for abundance and sequence similarity.
For 18S, taxonomy was assigned using the Protist Ribosomal Reference database (PR2 ) version 4.13.0.
For 16S, taxonomy was assigned using the Silva database version 138.1.
The data was then combined into a phyloseq object using phyloseq version 1.44.0 for subsequent processing.
Sequence counts were normalized using the varianceStabilizingTransformation function in the DESeq2 R package version 1.30.1 to account for differences in sample sequencing depth.
R code for data analyses: data_analysis_ALAmethods_eDNA.R
All data and statistical analyses were performed using R 4.0.3.
First, in order to compare the two methods, ASVs belonging to the four phytoplankton groups were extracted. For cyanobacteria, this meant all 16S ASVs from the phylum Cyanobacteria. For green algae, this included ASVs from the division Chlorophyta, and the classes Mesostigmatophyceae, Zygnemophyceae, and Coleochaetophyceae. For the chromophytes, this included ASVs of the classes Bacillariophyta, Dinoflagellata and Chrysophyceae. Finally, for the cryptophytes, this included ASVs assigned to the division Cryptophyta as well as Rhodophyta.
Diversity measures were calculated using the R package vegan version 2.6.4.
Plots were created using the ggplot2 package (v3.4.2) and ggeffects (v1.3.1).
Generalized linear mixed effect models (GLMMs) were performed using glmmTMB (v1.1.10). The performance package (v0.13.0) was used to calculate marginal and conditional R 2 values as well as test for singularity, while model diagnostics were performed using the package DHARMa version 0.4.7. Model significance was assessed using the Anova() function in the car package (v1.1-3).
Access information
Other publicly accessible locations of the data:
Experimental setup
To compare DNA metabarcoding and a traditional fluorescence method (ALA), we performed a biomonitoring study of the phytoplankton community composition and biomass in an outdoor, large-scale, replicated experiment. The experiment was conducted from April 7th to October 20th, 2020, in a setup consisting of 24 cylindrical polyethylene mesocosms (diameter = 0.7m, height = 1m) situated at Lund University (55.1° N, 13.2° E), in southern Sweden. The sides and base of each mesocosm were insulated, and each mesocosm was filled with 400L unfiltered lake water, i.e., contained organisms naturally occurring in the nearby eutrophic Lake Ringsjön (55.9° N, 13.5° E). Lake sediment was also added to the mesocosms to facilitate the natural succession, recruitment of resting stages and help stabilise water chemistry. The sediment, also collected from Lake Ringsjön, was drained and homogenised before 500g was placed into a plastic tray (14x14x10 cm) and placed at the bottom of each mesocosm.
Filtered deionized water was added to compensate for evaporation losses, and the walls of the mesocosms were scrubbed to minimise growth of periphytic algae. In addition, 0.5 mL of commercially available plant nutrients (Blomstra växtnäring, Cederroth, Upplands Väsby, Sweden; Nitrogen: Phosphorus = 100:13) was added to each mesocosm every two weeks. The experiment consisted of four treatments with six replicate mesocosms per combination (randomly assigned): control ambient conditions mimicking the present climate state in a temperate shallow lake; temperature warming via heatwaves; multispecies invasion; and combined warming and invasion. For details about the treatment and experimental system, see Salis et al. (2023) and Hansson et al. (2013). However, we observed no significant effects of temperature warming or invasion treatments on the absolute and relative biomass concentrations (ALA) and normalised read counts (DNA metabarcoding) of the four phytoplankton groups (see Statistical Analysis section below and Supporting Information Table S1). Therefore, this paper focuses on the comparisons between the ALA and DNA methods.
Sample collection
Water samples were collected from each mesocosm for spectral fluorometry analysis and environmental DNA metabarcoding analysis on 5 sampling dates in 2020: July 28th, August 11th, August 25th, September 22nd, and October 20th. On each sampling date, 24 samples were taken to be analysed by each of the two sampling methods.
For ALA analysis, a 100 mL water sample was taken from each mesocosm. The samples were immediately taken to the lab, where phytoplankton group-specific biomass was estimated using a spectral fluorometer (Algae Lab Analyser; bbe Moldaenke GmbH, Germany). The Algae Lab Analyser (ALA) measures in vivo chlorophyll a fluorescence to infer the relative and absolute biomass of major phytoplankton groups. The instrument detects fluorescence excited by five high-intensity light-emitting diodes (LEDs) with specific wavelengths (λ = 450, 525, 570, 590, and 610 nm), enabling the discrimination of four groups: green algae, chromophytes (including diatoms, chrysophytes, and dinoflagellates), cryptophytes, and cyanobacteria. Group-level discrimination is based on the linear independence of their norm spectra, as described in Beutler et al. (2002). The device provides quantitative estimates of total and group-specific chlorophyll a concentrations (µg/L) in the range of 0–500 µg/L with a resolution of 0.01 µg/L and a lower detection limit of 0.05 μg/L. For each of the 24 mesocosms, 25 mL water samples were measured in triplicate using a glass cuvette (25 mm × 25 mm × 70 mm). Measurements were performed at room temperature, and the mean of the three technical replicates was used for further analysis.
For eDNA analysis, water was filtered through two consecutive filters: a 1.2 μm cellulose nitrate filter (Whatman), followed by a 0.22 μm Sterivex filter (Millipore) until the Sterivex filter was clogged (10-3000 mL). Both filters were then stored at -20°C. Three negative controls (filtered deionized water) were also taken on each sampling date. Genomic DNA was extracted using the Qiagen Blood and Tissue Kit (see other paper for more details) under sterile lab conditions (UV hood) and amplified using standard 18S (chromophytes, cryptophytes, and green algae) and 16S (cyanobacteria) markers. Amplification and sequencing were performed at The Integrated Microbiome Resource (IMR), Canada (Comeau et al. 2017). A single round of PCR was done using "fusion primers" (Illumina adaptors + indices + specific regions). For chromophytes, cryptophyte, and green algae, the E572F (5’-CYGCGGTAATTCCAGCTC-3’) and E1009R (5’-AYGGTATCTRATCRTCTTYG-3’) primer pair was used, while for cyanobacteria, the B989F (5’-ACGCGHNRAACCTTACC-3’) and BA1406R (5’-ACGGGCRGTGWGTRCAA-3’) primer pair was used (Comeau et al. 2011). DNA samples were amplified in duplicate using 1:1 and 1:10 template dilutions in 20 µL reactions containing: 2 or 0.2 μL of DNA, 4 μL of each 1 μM primer, 0.4 μL of 40 mM deoxyribonucleotide triphosphates (dNTPs), 4 µl of 5X Phusion Plus Buffer, and 0.2 μL of Phusion Plus DNA polymerase (Thermo Scientific™). PCR negative controls (RNase/DNase-free water) were included to check for contamination. PCR conditions were initial denaturation at 98°C for 30 s, followed by 25 cycles of denaturation at 98°C for 10 s, annealing at 55°C for 30 s, and 72°C for 30 s, with a final elongation step of 72°C for 4 min 30 s. Duplicate PCRs were pooled and checked visually on a high-throughput Hamilton Nimbus Select robot using Coastal Genomics Analytical Gels. PCR negative controls were verified to be free from any visible bands. The PCR reactions (including 4 PCR negative controls) were then cleaned up and normalized using the high-throughput Just-a-Plate 96-well Normalization Kit (Charm Biotech) and sequenced with 10% PhiX on an Illumina MiSeq V3-600 2x300 bp (IMR, Canada; Comeau and Kwawukume 2023).
Sequence Analysis
Sequence processing was performed using R 4.0.3 (R Core Team 2023). The DADA2 pipeline (Callahan et al. 2016) was used to infer amplicon sequence variants (ASVs) based on error models to correct sequencing errors while accounting for abundance and sequence similarity (Callahan et al. 2017). The resulting ASVs are biologically meaningful sequence variants comparable to haplotypes (Callahan et al. 2017; Elbrecht et al. 2018; Porter and Hajibabaei 2020). For 18S, taxonomy was assigned using the Protist Ribosomal Reference database (PR2) version 4.13.0 (Guillou et al. 2013). For 16S, taxonomy was assigned using the Silva database version 138.1 (McLaren and Callahan 2021). After bioinformatic processing, 1,387,818 18S and 1,625,749 16S reads were retained, assigned to 12,013 and 22,143 ASVs, respectively. No reads were retained in the PCR negative control samples. The data were then combined into a phyloseq object using phyloseq version 1.44.0 (McMurdie and Holmes 2013) for subsequent processing. Sequence counts were normalised using the varianceStabilizingTransformation function in the DESeq2 R package version 1.30.1 (Love et al. 2014) to account for differences in sample sequencing depth (McMurdie and Holmes 2014). In order to compare the two methods, ASVs belonging to the four phytoplankton groups were then extracted. For cyanobacteria, this meant all ASVs from the phylum Cyanobacteria. For green algae, this included ASVs from the division Chlorophyta, and the classes Mesostigmatophyceae, Zygnemophyceae, and Coleochaetophyceae. For the chromophytes, this included ASVs of the classes Bacillariophyta, Dinoflagellata, and Chrysophyceae. Finally, for the cryptophytes, this included ASVs assigned to the division Cryptophyta as well as Rhodophyta. Diversity measures were calculated using the R package vegan version 2.6.4.
Statistical Analysis
All statistical analyses were performed using R 4.0.3 (R Core Team 202,3) and plots were created using the ggplot2 package (v3.4.2) and ggeffects (v1.3.1). Generalised linear mixed effect models (GLMMs) were performed using glmmTMB (v1.1.10). The performance package (v0.13.0) was used to calculate marginal and conditional R2 values as well as test for singularity, while model diagnostics were performed using the package DHARMa version 0.4.7. Distributions were selected based on the nature of the response variable (Supporting Information Fig. S1). Negative binomial link = “log”) was used for overdispersed count data (normalised read counts and number of ASVs). Beta regression (link = “logit”) was used for proportion data bounded between 0 and 1 (relative number of reads, relative number of ASVs, relative ALA biomass). Lognormal was used for the absolute ALA biomass concentrations (continuous, positively skewed). First, GLMMs were used to test the effects of the temperature and invasion treatments on the absolute and relative ALA biomass concentrations, and eDNA normalised read counts and number of ASVs, of the four phytoplankton groups separately. Temperature treatment, invasion treatment, and their interaction were included as fixed factors, while mesocosm ID and sampling date were included as random factors in the model. Where the inclusion of random effects resulted in singular models (i.e, near-zero variance estimates), the structure was simplified accordingly. Model significance was assessed using Type III Wald chi-square tests via the Anova() function in the car package (v1.1-3). No significant effects of temperature or invasion treatments were observed on the absolute and relative ALA biomass concentrations, the relative number of reads and ASVs for the four phytoplankton groups, nor on the normalised read counts or number of ASVs of chromophytes, cyanobacteria, or green algae (Supporting Information Table S1). However, the normalised read counts and number of ASVs of cryptophytes showed a significant relationship with invasion treatment, consistent with previous findings in the same system (Salis et al. 2023). Due to the general lack of treatment effects, temperature and invasion treatments were not included in the comparisons between the ALA and eDNA methods. GLMMs were then used to compare the absolute values (eDNA read counts vs. log-transformed ALA biomass, and number of ASVs vs. log-transformed ALA biomass) and relative abundances (relative read abundance vs. proportion of total chlorophyll a and relative number of ASVs vs. proportion of total chlorophyll a). These were modelled both across all phytoplankton groups (including phytoplankton group as a random factor) and separately for cryptophytes, chromophytes, cyanobacteria, and green algae. For the absolute models, the predictor ALA biomass was log-transformed biomass (concentration) to reflect its multiplicative structure and to improve model fit. Mesocosm and sampling date were included as random factors in all models, unless singularity required their exclusion. Only samples where the algal groups were detected by both methods were included, and model significance was assessed using Type II Wald chi-square tests.
