Environmental DNA data of aquatic insects for habitat suitability models
Data files
Jun 05, 2025 version files 738.75 MB
-
arthropodes_species_matrix_jeanine_2022_12_16_1.5.rds
588.49 KB
-
MS4_Supplement_IndividualSpecies_copy.pdf
723.05 MB
-
p677_JB_Leese_201019_MapFile.txt
93.91 KB
-
p677_run200221_COI_ZOTU_Count_Sintax_ReferenceG.txt
12.71 MB
-
p677_run201020_COI_ZOTU_Count_Sintax.txt
2.30 MB
-
README.md
5.64 KB
Abstract
The rapid loss of biodiversity in freshwater systems asks for a robust and spatially explicit understanding of species’ occurrences. As two complementing approaches, habitat suitability models provide information about species’ potential occurrence, while environmental DNA (eDNA) based assessments provide indication of species’ actual occurrence. Individually, both approaches are used in ecological studies to characterize biodiversity, yet they are rarely combined. Here, we integrated high-resolution habitat suitability models with eDNA-based assessments of aquatic invertebrates in riverine networks to understand their individual and combined capacity to inform on species’ occurrence. We used eDNA sampling data from 172 river sites and combined the detection of taxa from three insect orders (Ephemeroptera, Plecoptera, Trichoptera; hereafter EPT) with suitable habitat predictions at a subcatchment level (2 km2). Overall, we find congruence of habitat suitability and eDNA-based detections. Yet, the models predicted suitable habitats beyond the observed number of detections by eDNA sampling, congruent with the suitable niche being larger than the realized niche. For local mismatches, where eDNA detected a species but the habitat was not predicted suitable, we calculated the minimal distance to upstream suitable habitat patches, indicating possible sources of eDNA signals from upstream sites subsequently being transported along the water flow. We estimated a median distance of 1.06 km (range 0.2–42) of DNA transport based on upstream habitat suitability, and this distance was significantly smaller than expected by null model predictions. This estimated transport distance is in the range of previously reported values and allows extrapolations of transport distances across many taxa and riverine systems. Together, the combination of eDNA and habitat suitability models allows larger scale and spatially integrative inferences about biodiversity, ultimately needed for the management and protection of biodiversity.
This dataset belongs to the article: Habitat suitability models reveal the spatial signal of environmental DNA in riverine networks
submitted by: Jeanine Brantschen, jea.brantschen@gmail.com
Description of the Data and file structure
The folder contains the raw data of two eDNA sampling campaign across all of Switzerland. For these campaigns, as part of the NAWA surface water monitoring by the Federal Office of the Environment, eDNA metabarcoding was performed using two different primer pairs targeting the COI genetic barcode. This study focused on 127 insect species and their distribution patterns based on the two sampling campaigns. The pattern from the field eDNA campaign was then compared to state-of-the-art SDMs built by a pipeline from another article (cited as Adde et al., 2023 in the paper). All data associated with the SDM can be found in the related article and will not be included in this repository, as it is already publicly available. The sequencing data were generated in one paired-end run on an Illumina MiSeq at the GDC, ETH Zurich. The files are in FASTA format (R1 being the forward and R2 being the reverse read) and can be opened using a text editor or directly fed into a bioinformatic pipeline.
Additionally, there is another file (MS4Supplements_IndividualSpecies) showing the 1:1 comparison of the individual 127 species based on eDNA and the SDMs. This file is referred to in the text as Supplementary 2.
Explanation of data files:
- p677_JB_Leese_201019_MapFile.txt is the metadata file required to associate the read samples with environmental parameters. For each sample, there is a row providing information about the sampling site. This can be read into R to build a phyloseq object. Here is a list of the columns and their explanation:
- SampleID: This is the unique name of the sample.
- identity: controls( negative/blank), positive control (known composition), sample (eDNA sample)
- bank: left or right river bank
- site: sampling site
- canton: geographical unit
- filter: replicate 1-4 from one site
- catchment: river catchment based on swiss topo
- BarcodeSequence i5: unique barcode sequence forward direction
- BarcodeSequence i7: unique barcode sequence reverse direction
- LinkerSequenceForward: Linker sequence in the amplicon between barcode and fragment forward direction
- PrimerSequenceForward: sequencing primer sequence in the amplicon between barcode and fragment forward direction
- LinkerSequenceReverse: Linker sequence in the amplicon between barcode and fragment reverse direction
- PrimerSequenceForward: sequencing primer sequence in the amplicon between barcode and fragment reverse direction
- Experiment: sequencing run identification
- Platform: sequencing platform
- Cyclekite: illumination chemistry
- RunDate: date of the sequencing run
- I5_Index_ID index: Nextera index kit number and sequence forward
- I7_Index_ID index: Nextera index kit number and sequence reverse
- PCR_plate: Which PCR plate (1-4) this sample was on
- PCR_position: Which well position this sample was in (1-96 per plate)
- description: customizer identifier of the data set and run
- p677_run201020_COI_ZOTU_CountSintax.txt is the processed OTU table including a taxonomic assignment using Syntax. The table contains samples as columns and rows as ESVs.
- The rows are the bioinformatically generated Exact sequence variances (ESV) for the 12S Mifish barcode. The first column (#OTU ID) contains the name of the ESVs, the following columns link to the SampleID in the previous file and contain the abundance data of the ESVs in a sample. At the end of the table (column consensus lineage) is a taxonomic assignment from the phylum to the species level.
- p677_run200221_COI_ZOTU_Count_Sintax_ReferenceG.txt is a copy of the processed OTU table using a customized reference database to improve annotation.
- The rows are the bioinformatically generated Exact sequence variances (ESV) for the 12S Mifish barcode. The first column (#OTU ID) contains the name of the ESVs, the following columns link to the SampleID in the previous file and contain the abundance data of the ESVs in a sample. At the end of the table (column consensus lineage) is a taxonomic assignment from the phylum to the species level based on the curated database (Reference G).
- arthropodes_species_matrix_jeanine_2022_12_16_1.5.rds gives the species matrix from the species distribution modeling used as raw input.
- This file is the species-site-abundance matrix for the insect larvae species, the structure is similar to the previous to files, with the latin names of aquatic insect species aligned with the catchments of rivers in Switzerland based on SwissTopo. This is the output of the habitat suitability models. The numeric values indicate the probability of a species occurring in a catchment.
- MS4_Supplement_IndividualSpecies_copy.pdf is a list of species detected with either method.
- This file gives a list of the observed species in the orders Ephemeroptera, Trichoptera and Plectopetra and the columns are the different methods. This table helps to show which method detected which species.
Sharing/access Information
The sequencing data is only publicly available on Dryad as well as additional supplementary files.
For further information on the R Code or the raw data, please contact the corresponding author (jea.brantschen@gmail.com).
- Brantschen, Jeanine (2025). Environmental DNA data of aquatic insects for habitat suitability models. Zenodo. https://doi.org/10.5281/zenodo.11188909
- Brantschen, Jeanine (2025). Environmental DNA data of aquatic insects for habitat suitability models. Zenodo. https://doi.org/10.5281/zenodo.11188908
- Adde, Antoine; Rey, Pierre‐Louis; Brun, Philipp et al. (2023). N‐SDM: a high‐performance computing pipeline for Nested Species Distribution Modelling. Ecography. https://doi.org/10.1111/ecog.06540
- Brantschen, Jeanine; Fopp, Fabian; Adde, Antoine et al. (2024). Habitat suitability models reveal the spatial signal of environmental DNA in riverine networks. Ecography. https://doi.org/10.1111/ecog.07267
