Data and code for: Accounting for temporal variation and correlation in environmental DNA sampling can improve ecological inferences
Data files
Apr 22, 2026 version files 6.67 GB
-
22READINET_Summary_(gBlock_Standards)_240925.xlsx
65.68 KB
-
Data_Repository.zip
6.67 GB
-
Metadata.xlsx
275.49 KB
-
NOROCK_detectiondata.xlsx
81.93 KB
-
READI_NET_temporal_eDNA_results_2022_UMESC_New.xlsx
53.64 KB
-
README.md
6.27 KB
-
WARC_Temporal_Sampling_Data_Grass_Carp_Nov2022.xlsx
47.75 KB
Abstract
Environmental DNA (eDNA) concentration varies through space and time, and measurements collected close together are often correlated. Ignoring this dependence can inflate the rate of incorrect ecological inferences (Type I error rate). Although spatial correlation in eDNA has received considerable attention, temporal correlation has been less well studied. Statistical models and study designs that account for temporal correlation are increasingly important to understand time-dependent effects in complex systems.
We developed a hierarchical model that separates temporal ecological variation from variability stemming from sampling and laboratory processes and applied it to four single-site eDNA time series collected over 17–24 days, three of which provided sufficient information for parameter estimation. We then used the empirically estimated parameter magnitudes in a simulation study to evaluate alternative temporal sampling designs that considered 1) how a fixed number of samples is allocated across different numbers of sampling times with different levels of temporal replication and 2) equally-spaced versus. cluster-spaced sampling (short bursts separated by longer gaps).
Across the three time series sufficient for analysis, we observed substantial sampling variability, temporal variability, and temporal correlation, although correlation was estimated imprecisely (large coefficient of variation). Simulations showed that when sampling intervals were shorter than the effective temporal correlation range, models that ignored temporal dependence produced inflated Type I error rates and frequently detected spurious temporal trends. Accounting for temporal correlation substantially reduced this inflated Type I error rate.
Optimal sampling strategies depended on study objectives. Clustered sampling most effectively estimated temporal correlation. When temporal dependence was negligible, evenly spaced sampling maximized power to detect trends. Estimating sampling variability required concentrating effort into fewer sampling times with more replicates per time, whereas estimating temporal variance was most precise with intermediate levels of replication.
Together, these results indicate that temporal dependence can strongly affect inference from quantitative eDNA time series when sampling intervals approach the correlation timescale. Designs that ignore this dependence risk inferring ecological change or difference where none exists. Our framework provides practical guidance for allocating sampling effort in temporally intensive eDNA monitoring and for interpreting trends from short time series.
https://doi.org/10.5061/dryad.g4f4qrg19
Description of the data and file structure
Files and variables
File: 22READINET_Summary_(gBlock_Standards)_240925.xlsx
Description: FRESC data set
Variables
- See Metadata.xlsx
File: Metadata.xlsx
Description: Metadata for the 4 data sets
File: NOROCK_detectiondata.xlsx
Description: NOROCK data set
Variables
- See Metadata.xlsx
File: READI_NET_temporal_eDNA_results_2022_UMESC_New.xlsx
Description: UMESC data set
Variables
- See Metadata.xlsx
File: WARC_Temporal_Sampling_Data_Grass_Carp_Nov2022.xlsx
Description: WARC data set
Variables
- See Metadata.xlsx
Code/software
In the code repository (Data_Repository.zip), there are two folders, one for the empirical analyses and a second for the simulation analyses. Below, in some places, I reference the data set with "X", where X is NOROCK, UMESC, WARC, and FRESC.
Empirical Analyses:
1. The raw data for NOROCK, UMESC, WARC, and FRESC are in NOROCK_detectiondata.xlsx, READI_NET temporal eDNA results 2022 UMESC New.xlsx, WARC Temporal Sampling Data_Grass Carp_Nov2022.xlsx, and 22READINET Summary (gBlock Standards) 240925.xlsx, respectively. "Metadata.xlsx" contains the metadata for each data set, one tab per data set.
2. Files to process the NOROCK, UMESC, WARC, and FRESC data sets for analysis are "Process Data NOROCK.R", "Process Data UMESC.R", "Process Data WARC.R", and "Process Data FRESC.R", respectively.
3. The processed data sets for NOROCK, UMESC, WARC, and FRESC are "NOROCK_data.RData", "UMESC_data.RData"
, "WARC_data.RData", and "FRESC_ANCA_data.RData", respectively.
4. "NimModel Temporal spExp.R" is the nimble model file used for all sites except WARC, and "NimModel Temporal spExp WARC.R" is the one used for WARC.
5. "NimModel Temporal spExp undermeasure X.R" is the nimble model file for the undermeasurement model at each site (excluding FRESC).
6. Files to run a single chain of the null model (no undermeasurement) for each site are "X spExp.R".
7. Files to run multiple chains of the null model (no undermeasurement) and compute the WAIC for each site are "X spExp WAIC.R".
8. Files to run a single chain of the undermeasurement model for each site are "X spExp undermeasure.R".
9. Files to run multiple chains of the undermeasurement model and compute the WAIC for each site are "X spExp undermeasure WAIC.R". (did not run for FRESC, so not files).
10. The multi-chain output for each site from the null model (no undermeasurement) is in "output_WAIC_X.RData".
11. The multi-chain output for each site from the undermeasurement model is in "output_WAIC_X_undermeasure.RData".
12. The file to process model fits to produce posterior summaries and tables are in "Process Temporal Fits.R".
13. The file to do posterior predictive checks is in "PPcheck.R".
14. The file to make plots of the empirical data set results are in "Plot Temporal Fits.R".
Simulation Analyses:
1. The nimble model files used have file names beginning with "NimModel Temporal Trend". These files with "spExp" in them consider temporal correlation, while those without "spExp" assume temporal independence. "centered" and "noncentered indicate centered and noncentered parameterizations, respectively."1Samp" indicates there is only 1 sample per sample time, so there is no sampling variation process. "Binary" indicates a model for binary data (this material is only in the Supplementary Material). Files without "binary" are for quantitative data.
2. "Simulate Datasets.R" and "Simulate Datasets Burst.R" simulate equally spaced and cluster sampling scenarios, respectively. These files call the data simulators in "simSpExp.R" and "simSpExp.burst.R", respectively. Simulated datasets are put in folders with the scenario names described in 5 below.
3. "Summarize Datasets.R" and "Summarize Scenarios.R" produce summary statistics for the simulated data sets and their scenarios.
4. "Plot Clusters.R" makes Figure 1 in the manuscript showing the cluster configurations for different scenarios. "Plot Scenarios.R" plots simulated data sets showing concentration through time with detection and nondetection events (Figures B1-B4 in Supplementary Material)
5. The files to fit all simulated data set in a particular scenario are "Fit_Phi.R", "Fit_noPhi.R", "Fit_Phi_Burst.R", "Fit_noPhi_binary.R", and "Fit_Phi_Burst_binary.R". Each of these files runs multiple scenarios. "Fit_Phi.R" runs quantitative scenarios with equal spacing that estimates the temporal correlation. "Fit_noPhi.R" runs quantitative scenarios with equal spacing that does not estimate the temporal correlation. "Fit_Phi_Burst.R" runs quantitative scenarios with cluster spacing that estimates the temporal correlation. "Fit_noPhi_binary.R" runs binary scenarios with equal spacing that does not estimate the temporal correlation. "Fit_Phi_Burst_binary.R" runs binary scenarios with cluster spacing that estimates the temporal correlation. These files call other files starting with "Fit" and ending with "single", e.g., "Fit_S1H_noPhi_binary_single.R" fits the model to one data set for binary scenarios where we do not model the temporal correlation. These files are described in the main scripts listed above.
6. The simulation results are saved in RData files starting with a scenario indicator (e.g., S1H_noPhi) and ending with .results. They are in folders with scenario names as described in 5 above. These scenarios are described in the first 5 files listed in 5 above.
7. The simulation results are processed in "Process Phi Fits All Scenarios.R", "Process noPhi Fits All Scenarios.R", "Process Phi Binary Fits All Scenarios.R", "Process noPhi Binary Fits All Scenarios.R", and "Process FP Power.R".
8. Tables are made using the simulation results in "Phi Simulation Tables.R" and "noPhi Simulation Tables.R". Plots are made from the simulation results in "Plot Simulations Phi.R", "Plot Simulations noPhi.R", and "Plot Simulations noPhi Quantitative Only.R",
