Unveiling structure of tropical estuarine communities through eDNA and implications for biomonitoring

Data files

Aug 26, 2025 version files 39.99 GB

Datasets_and_R_script_for_PostOBITools_filtering.zip

105.43 MB
Environmental_data.zip

39.76 KB
FilteredDatasets.zip

2.56 MB
R_Script_for_statistical_analyses.zip

11.33 KB
RawData.zip

39.88 GB
README.md

23.12 KB

Abstract

Tropical estuaries are hyper-diverse ecosystems, hosting essential habitats for freshwater, euryhaline and marine life. Understanding how biological communities are distributed in these systems has long been a challenge because of their inherent dynamic nature, and the diversity of interacting natural pressures and anthropogenic stressors they are subjected to. In this study, we used environmental DNA (eDNA) metabarcoding to examine the structure of multi-taxonomic communities in estuarine ecosystems (diatoms, crustacean, fish and eukaryote as a whole) and their relationships with environmental drivers in three differentially impacted locations facing the Great Barrier Reef in Central Queensland (Australia). We first demonstrated that eDNA signals from sediment and water matrices provide complementary information, and that both should be monitored for a more holistic understanding of community trajectories in anthropogenically-impacted aquatic environments. We also observed that, independently of the taxonomic group considered, communities were primarily structured by the ecological conditions of the estuary. A within-estuary differentiation along an upstream-downstream gradient was detected but only for small-bodied organisms, which further adds credence of eDNA approaches as an ecologically relevant tool for monitoring fine-scale biodiversity patterns even in profoundly dynamic environments. Finally, the different communities exhibited contrasting response patterns, in terms of diversity, composition and uniqueness, to the anthropogenic gradient. Hence, our findings emphasize the need for multi-taxonomic assessments, for which eDNA is well-suited, to better understand the impacts of multiple stressors on biodiversity, and thereby assist decision makers in the protection and management of tropical estuaries.

Dataset DOI: 10.5061/dryad.qnk98sfvp

Description of the data and file structure

Description of the data

This dataset contains environmental DNA metabarcoding data use to describe the composition and structure of estuarine communities from Queensland, australia. Four taxonomic groups have been targeted (diatoms, crustacean, fish and eukaryote). Environmental data and metadata, raw and filtered as well as scripts to filter the sequencing data and run statistical analyses described in Pansu et al. (2025 Environmental DNA) are provided.

Multiple folders in this repository have been deposited:

'Raw data' contains the raw paired-end sequencing data (fastq files) generated by Illumina sequencing along with the scripts and associated files to conduct the first steps of the bioinformatic analysis. These scripts (to run under Linux) required the OBITools v1.2 (Boyer et al. 2016) and Sumaclust software (Mercier et al. 2013). The additional files emcompass files for demultiplexing sequences (i.e. assign each sequence to its sample of origin) as well as DNA reference databases used for the taxonomic assignment of sequences. A specific README file for this folder is provided.
'Datasets_and_Rscript_for_PostOBITools_filtering' contains outputs files of the first steps of the bioinformatic process conducted with the OBITools software along with an R script to run subsequent filtering steps in R (analyses conducted in R v4.0.3). A specific README file for this folder is provided.
'Filtered_datasets' contains filtered datasets. These filtered datasets contain relative read abundance (RRA) data of each mOTU per sample. One table per taxonomic group (i.e. per metabarcode) is provided (eukaryote, diatom, fish and crustaceans). They are directly usable for statistical analyses described in the file 'R_Script_Analyses_Pansu_et_al_2025_eDNA.Rmd'. A specific README file for this folder is provided.
'R_Script_for_statistical_analyses' contains the R script used for statistical analyses. A specific README file for this folder is provided.
'Environmental_data' contains the environmental datasets used in statistical analyses along with metadata for collected samples (also provided as Supplementary Data in Pansu et al. 2025)

References:

Boyer, F., C. Mercier, A. Bonin, Y. Le Bras, P. Taberlet and E. Coissac. 2016. “obitools: a unix-inspired software package for DNA metabarcoding”. Molecular Ecology Resources 16: 176–182. https://doi.org/10.1111/1755-0998.12428

Mercier, C., F. Boyer, A. Bonin and E. Coissac. 2013. “SUMATRA and SUMACLUST: fast and exact comparison and clustering of sequences.” In Programs and Abstracts of the SeqBio 2013 workshop. Abstract (pp. 27-29). Available at : https://git.metabarcoding.org/obitools

Contact author: Johan Pansu (johan.pansu@univ-lyon1.fr)

Files and variables

File: FilteredDatasets.zip

Description: These filtered datasets contain relative read abundance (RRA) data of each mOTU per sample. They are directly usable for statistical analyses described in the file 'R_Script_Analyses_Pansu_et_al_2025_eDNA.Rmd'. One spreadsheet per taxonomic group (i.e. per metabarcode) is provided: eukaryote, diatom, fish and crustaceans.

Metabarcoding data were produced by amplifying the following primers from both water and sediment samples:

for eukaryote => F:5'-TGGTGCATGGCCGTTCTTAGT-3' & R:5'-CATCTAAGGGCATCACAGACC-3' (Hardy et al. (2010)
for diatom => F:5'-TCCAGCTCCAATAGCGTA-3' & R:5'-AACACTCTAATTTTTTCACAGTA-3' (this study)
for fish => F:5'-ACACCGCCCGTCACTCT-3' & R:5'-CTTCCGGTACACTTACCATG-3' (Valentini et al. (2016)
for crustacea => F:5'-GGGACGATAAGACCCTATA-3' & R:5'-ATTACGCTGTTATCCCTAAAG-3' (Berry et al. (2017)

Sequencing has been performed on Illumina platforms and were filtered using the OBITools software v.1.2 (see associated file 'Script_bioinformatics_obitools_Pansu_et_al_2025') and custom-made R scripts (see associated file 'Script_R_PostObitools_Filtering_Pansu_et_al_2025.Rmd') following the procedure described in Pansu et al. (2025): 'Unveiling structure of tropical estuarine communities through eDNA and implications for biomonitoring'.

Data are under the form of a 'sample x motu' matrix with the mean number of reads per sample (averaged among technical replicates)

Pansu_et_al_2025_Fish_filtered_dataset : Fish dataset
Pansu_et_al_2025_Eukaryote_filtered_dataset : Eukaryote dataset
Pansu_et_al_2025_Diatom_filtered_dataset : Diatom dataset
Pansu_et_al_2025_Crustacea_filtered_dataset : Crustacean dataset

Column headings:

1. id: Unique sequence identifier

2. best_identity: Best identity score with the closest sequence in the reference database

3. scientific_name: Binomial of the taxon to which the sequence has been assigned

4. rank: taxonomic rank of the scientific name

5. taxid: European Molecular Biology Laboratory (EMBL) TaxID of the scientific name

6. species_name: species name of the taxon if the assignment went down to this taxonomic level

7. genus_name: genus name of the taxon if the assignment went down to this taxonomic level

8. family_name: family name of the taxon if the assignment went down to this taxonomic level

9. order_name: order name of the taxon if the assignment went down to this taxonomic level

10. class_name: class name of the taxon if the assignment went down to this taxonomic level

11. phylum_name: phylum name of the taxon if the assignment went down to this taxonomic level

12. kingdom_name: kingdom name of the taxon if the assignment went down to this taxonomic level

13. sequence: DNA sequence

14. sample:XX_SYY_ZZ: Relative read abundance (RRA) of each mOTU per sample.

=> XX corresponds to code of the estuary (GR: Gregory River, MA: Sandy Creek, SH: Saint Helen River / Murray Creek)

=> SYY corresponds to sampling site within estuary (from 01 to 10)

=> ZZ corresponds to the type of sample ('s' for sediment and 'w' for water) and the biological replicate (from 1 to 3)

=> e.g. GR_S01_w1 indicates the replicate 1 of water samples ('w1') from the site 01 ('S01') located in the estuary of the Gregory river ('GR')

File: Environmental_data.zip

Description: This folder contains environmental data from Pansu et al. 2025 (Environmental DNA). Missing values are indicated by 'NA'. Five tables are provided:

1/ 'EnvironmentalData_Pansu_et_al_2025_eDNA.txt' contains the restricted set of environmental variables used in statistical analyses. Column headers are described below:

Site: site name where the sample has been collected
Position_in_estuary : position of the site along an upstream-downstream gradient in each estuary from L01 (the most upstream) to L10 (the most downstream)
Chlorophyll: chlorophyll concentration in the water (in ug/L)
Salinity: salinity of the water (in psu)
DO.sat : dissolved oxygen in water (in % of saturation)
Turbidity: turbidity of the water (in FNU)
pH: turbidity of the water
Temperature: Temperature of the water (in celsius)
Total.N: total nitrogen concentration in the water (in mg/L)
Total.P: total phosphorus concentration in the water (in mg/L)
TOC.water: total organic carbon in the water (in mg/L)
Phosphate.phosphorus: phosphate phosphorus concentration in the water (in mg/L)
TOC.sediment: total organic carbon in the sediment (in %)
CaCO3.sediment: calcium carbonate in the sediment (in %)
Percent.silt.clay: silt/clay fraction in the sediment (in %), a proxy for the granulometry
DIN: dissolved inorganic nitrogen in water (in mg/L)

2/ 'Geo_dist_SandyCreek.txt' contains geographic distance between sampling sites within Sandy Creek estuary (in km)

3/ 'Geo_dist_StHelensMurray.txt' contains geographic distance between sampling sites within St Helens/Murray River estuary (in km)

4/ 'Geo_dist_Gregory_river.txt' contains geographic distance between sampling sites within Gregory River estuary (in km)

5/ 'Supplementary_Data_Pansu_et_al_eDNA.xlsx' from Pansu et al. 2025 (environmental DNA) contains metadata. It is divided in 3 spreadsheets:

one containing GPS data + physico-chemical characteristics of water and sediment + nutrient concentrations measured in the water. The units for each variable are indicated in the column header
one with pesticides concentration in the water column, expressed in ul/L
Concentrations of metals and other extractable elements in sediments, expressed in mg/kg.
The code for identifying sites is described below:
- XX corresponds to code of the estuary (GR: Gregory River, MA: Sandy Creek, SH: Saint Helen River / Murray Creek)
- SYY corresponds to sampling site within estuary (from 01 to 10)

File: R_Script_for_statistical_analyses.zip

Description: This R markdown script aims at recreating in R all figures and analysis from Pansu et al. 2025 (Environmental DNA). The datasets to use with this script (with the name ending by "Filtered_dataset.txt") are located in the folder called "Filtered_datasets". They contain the mean number of reads observed per sample for each mOTU (see README_FilteredDatasets.txt' for details). One dataset per taxonomic group is provided.

File: Datasets_and_R_script_for_PostOBITools_filtering.zip

Description: This folder contains pre-filtered files with the OBITools software in a table format. The R markdown script provided aims at conducting additional filtering steps. One additional folder containing the EMBL taxonomy (release R140) is provided to standardize the taxonomic information (see the RMarkdown script).

The datasets to use with this script (with the name ending by "ToCleanInR.txt") are under a table format and contain, for each sequence, several pieces of information along with their number of reads per PCR replicates. The headers are defined below:

id: Unique sequence identifier
best_identity: Best identity score with the closest sequence in the reference database
best_match: Name of the sequence with the best match in the reference database
cluster: cluster to which the sequence has been assigned with sumaclust
cluster_center: is the sequence a center of the cluster (i.e. the most abundant sequence)? Yes or no
cluster_score: similarity of the sequence with the center of the cluster it belongs to
cluster_weight: total number of reads belonging to the cluster to which the sequence belongs to
count: number of reads for the sequence
family: taxid of the family to which the sequence has been assigned
family_name: family name to which the sequence has been assigned
genus: taxid of the genus to which the sequence has been assigned
genus_name: family genus to which the sequence has been assigned
match_count: number of matches in the reference database
order: taxid of the order to which the sequence has been assigned
order_name: order name to which the sequence has been assigned
rank: best taxonomic rank to which the sequence has been assigned
scientific_name: binomial of the taxon at the best taxonomic rank to which the sequence has been assigned
species: taxid of the order to which the sequence has been assigned
species_list: list of potential species
species_name: species name to which the sequence has been assigned
taxid: European Molecular Biology Laboratory (EMBL) TaxID of the scientific name
sequence: DNA sequence
sample: number of reads of each sequence per sample.
The code for identifying samples is described below:
- samples starting by 'EMPTY' are empty wells used to identify tag-jumps
- samples starting by 'POS' are positive controls
- samples starting by 'PCR' are PCR controls
- samples starting by 'Fld' are field controls
- samples starting by 'Ext' are extraction controls
- all the others are true samples and named are under the structure 'WW_SXX_YY_Z':
  - WW corresponds to code of the estuary (GR: Gregory River, MA: Sandy Creek, SH: Saint Helen River / Murray Creek)
  - SXX corresponds to sampling site within estuary (from 01 to 10)
  - YY corresponds to the type of sample ('s' for sediment and 'w' for water) and the biological replicate (from 1 to 3)
  - Z corresponds to the PCR replicate ('a', 'b' or 'c')
  - e.g. GR_S01_w1_a indicates the PCR replicate 'a' of water sample 1 ('w1') from the site 01 ('S01') located in the estuary of the Gregory river ('GR')

File: RawData.zip

Description: This folder contains both the raw data as well as scripts and associated files to run the first steps of the bioinformatic filtering.

1/ The raw data are in a fastq format and come from two different sequencing runs:

amplicons generated by fish, diatom and crustaceans primers have been sequenced altogether on an Illumina NextSeq 100 (2 × 150bp paired-end) run. Four sequencing lanes have been used and therefore 2x4 fastq files have been produced:
- PAN7075_S1_L001_R1_001.fastq & PAN7075_S1_L001_R2_001.fastq
- PAN7075_S1_L002_R1_001.fastq & PAN7075_S1_L002_R2_001.fastq
- PAN7075_S1_L003_R1_001.fastq & PAN7075_S1_L003_R2_001.fastq
- PAN7075_S1_L004_R1_001.fastq & PAN7075_S1_L004_R2_001.fastq
amplicons generated by the eukaryote primers have been sequenced on an Illumina MiSeq platform (2 × 150bp paired-end). Two fastq files have been produced.
- TCP-18S-JP_S1_L001_R1_001.fastq & TCP-18S-JP_S1_L001_R2_001.fastq

2/ These fastq files have first been analyzed with OBITools v.1.2 (https://forge.metabarcoding.org/obitools/obitools) and Sumaclust (https://forge.metabarcoding.org/obitools/sumaclust) software (see http://metabarcoding.org/) following the procedure described in the shell scripts entitled: 'script_bioinformatics_obitools_Eukaryote_Pansu_et_al_2025.sh' and 'script_bioinformatics_obitools_Diatom_Fish_Crustacean_Pansu_et_al_2025.sh'

3/ Eight base pairs (bp) tags, each differing by at least 5 nucleotides, were added to the 5’ end of each primer to enable the multiplexing of multiple PCR products into the same library before high-throughput sequencing. The demultiplexing files are used for assigning each sequence to their original samples using the combinations of 8-bp tags attached to the primers (see shell scripts). The column heading of those files is below:

column 1: Name of the primer
column 2: Name of the sample
column 3: 8bp tag used in forward and reverse primers (separated by':')
column 4: Forward primers
column 5: Reverse primers
column 6: Location of the PCR products into the PCR plates
The code for identifying samples is described below:
- samples starting by 'EMPTY' are empty wells used to identify tag-jumps
- samples starting by 'POS' are positive controls
- samples starting by 'PCR' are PCR controls
- samples starting by 'Fld' are field controls
- samples starting by 'Ext' are extraction controls
- all the others are true samples and named are under the structure 'WW_SXX_YY_Z':
  - WW corresponds to code of the estuary (GR: Gregory River, MA: Sandy Creek, SH: Saint Helen River / Murray Creek)
  - SXX corresponds to sampling site within estuary (from 01 to 10)
  - YY corresponds to the type of sample ('s' for sediment and 'w' for water) and the biological replicate (from 1 to 3)
  - Z corresponds to the PCR replicate ('a', 'b' or 'c')
  - e.g. GR_S01_w1_a indicates the PCR replicate 'a' of water sample 1 ('w1') from the site 01 ('S01') located in the estuary of the Gregory river ('GR')

4/ The reference databases used for taxonomic assignment are provided along with the corresponding taxonomy (contained in the folder called 'EMBL_R140'). They have been produced with the ecoPCR program (Ficetola et al. 2010) from the ENA (European Nucleotide Archive) database (release r140), hosted at EMBL-EBI and are directly usable with OBITools. The list of these reference database is below:

Euka01.DB.fasta for eukaryotes
Crust16S.DB.fasta for crustaceans
Diatom.DB.fasta for diatoms
Tele01.DB.fasta for fish

Code/software

The code for filtering raw data is under two forms:

unix scripts to proceed to the first steps of the raw data filtering. These scripts were run on unix and use the OBITools v1.2 (https://forge.metabarcoding.org/obitools/obitools; Boyer et al. 2016) and Sumaclust software (https://forge.metabarcoding.org/obitools/sumaclust; Mercier et al. 2016). Paired-end reads were merged using the illuminapairedend command and sequences with a low alignment-quality score (<40, the value corresponding to perfect alignment between the last 10 bases of each read) were discarded. Consensus sequences were then assigned to their original samples from the 8-bp tags attached to the primers using the ngsfilter command (with default parameters allowing zero errors on tags and a maximum of two errors on primers). We then discarded sequences with ambiguous nucleotides and those with a size outside the expected length range (i.e., 40-440 bp for eukaryotes, 100-240 bp for diatoms, 100-600 bp for crustaceans, 40-100 bp for fish) using obigrep. Identical sequences were merged with the obiuniq command, which retains information about their occurrence in each sample, and sequences with <10 reads over the entire dataset were removed. Taxonomic assignment was performed using the ecotag command against in silico reference databases specific to each marker, generated with the ecoPCR program (Ficetola et al. 2010) from the ENA (European Nucleotide Archive) database (release r140), hosted at EMBL-EBI. Sequences not assigned to the correct taxonomic group they should belong to (i.e., either eukaryote, bacilliarophyta, crustacea or teleost depending on the marker) were discarded using obigrep. Molecular Operational Taxonomic Units (mOTUs) were created using the Sumaclust program (Mercier et al. 2013) with a 97% similarity threshold (and the -R parameter set to 0.25). Fasta files were then converted into a sequence-by-sample matrix using the obitab command. This procedure is described in the shell scripts entitled 'script_bioinformatics_obitools_Eukaryote_Pansu_et_al_2025.sh' and 'script_bioinformatics_obitools_Diatom_Fish_Crustacean_Pansu_et_al_2025.sh'
Additional filtering steps were performed in R version 4.0.3 using custom-made scripts ('Script_R_PostObitools_Filtering_Pansu_et_al_2025.Rmd'). All required libraries are available on CRAN, except the ROBITools (available here: https://forge.metabarcoding.org/obitools/ROBITools) and ROBITaxonomy (available here: https://forge.metabarcoding.org/obitools/ROBITaxonomy). We first discarded PCR products with low numbers of reads. For this, we compared the density distribution of the log-transformed number of reads in negative controls and in true samples within each library, using the intersection of the two distributions as a threshold. Because of differences in sequencing depths among libraries, this procedure led to the removal of samples with less than 13324, 32220, 741 and 3354 reads for eukaryote, diatom, crustacea and fish, respectively. We also removed putative contaminants by discarding any sequence that had its maximal average relative read abundance (RRA) over the whole dataset in negative controls rather than in true samples (Zinger et al. 2021), so were those with a taxonomic assignment score <70% which were considered likely to be chimaeras and/or highly degraded sequences. We then merged sequences into mOTUs based on Sumaclust results and gave to each mOTU the taxonomic assignment of the most common sequence composing it. Next, we discarded PCR replicates with non-reproducible results (i.e. with a too divergent composition in terms of mOTUs). For each library, we iteratively determined the density distributions of within- and between-sample Bray-Curtis distances (based on their reads composition) and discarded replicates that fell within the distribution of between-sample distances, the threshold being defined as the intersection of the two density distributions (Pansu et al. 2022). This process was iterated until no further replicate was removed. Then, we averaged the number of reads among replicates from a same sample. This procedure is described in the file entitled "Script_R_PostObitools_Filtering_Pansu_et_al_2025.Rmd"

The code for running statistical analyses ('R_Script_Analyses_Pansu_et_al_2025_eDNA.Rmd') is a R markdown script aims at recreating all figures and analysis from Pansu et al. 2025 (Environmental DNA). Analyses were conducted in R version 4.0.3. All required libraries are available on CRAN, except the ROBITools (available here: https://forge.metabarcoding.org/obitools/ROBITools) and ROBITaxonomy (available here: https://forge.metabarcoding.org/obitools/ROBITaxonomy). The datasets to use with this script (with the name ending by "Filtered_dataset.txt") are located in the folder called "Filtered_datasets". They contain the mean number of reads observed per sample for each mOTU (see README_FilteredDatasets.txt' for details). One dataset per taxonomic group is provided. Four additional tables containing environmental data are provided in the "environmental_data" folder. Before analyses, only sequences that represented at least 0.1% in at least one sample were retained. In addition, to reduce the impact of low-abundance false positives that can arise from tag-jumps during Illumina sequencing, we removed sequences representing <0.1% of reads in each sample. Finally, abundance data (i.e., reads counts) were converted into presence/absence

References:

Ficetola, G. F., E. Coissac, S. Zundel, T. Riaz, W. Shezhad, J. Bessière, P. Taberlet, et al. 2010. “An In silico approach for the evaluation of DNA barcodes.” BMC Genomics 11: 434. https://doi.org/10.1186/1471-2164-11-434

This dataset comprises environmental DNA datasets (both raw and filtered datasets along with associated scripts) and physico-chemical data obtained from water and sediment collected in three different estuaries from Queensland, Australia in 2018. 10 sites were selected at each estuary, at approximately 1 km intervals from the mouth of the river going upstream.

Environmental parameters of the water column (including salinity, pH, dissolved oxygen, turbidity, chlorophyll-a and temperature) were measured using a calibrated EXO2 YSI multiparameter sonde (YSI, Yellow Springs, OH, USA). Standard sampling protocols from the Queensland Government Department of Environment and Science were followed for determining contaminant (nutrient and pesticide) concentrations in the water column. Analysis of nutrients in water was carried out by the DES Chemistry Centre laboratory. Total and dissolved organic carbon were measured using a non-dispersive infrared sensor (NDIS). Total Kjeldahl values for nitrogen and phosphorus were measured through low-level atomic absorption (AA) analyses. For sediments, total organic carbon analyses were performed using a high temperature total organic carbon analyzer (Dohrmann DC-190, Teledyne Tekmar, Mason, OH, USA; Chariton et al. 2010) and particle size analysis was conducted by successive sieving through 500-μm, 180-μm, and 63-μm meshes and gravimetry. Metal concentrations in sediments were determined at CSIRO using a dilute-acid (1 M HCl) extractable metals method. Pesticides analyses in both water and sediment samples were performed by the Queensland Government, Forensic and Scientific services laboratory. Pesticides in water samples were analyzed by direct injection via liquid chromatography with tandem mass spectrometry (LC-MS-MS) while pesticides in sediment samples underwent a solvent extraction followed by QuEChERS solid phase clean up prior to the LC-MS-MS analysis.

For eDNA analyses, water and sediment samples were collected in triplicate. On the same day as sampling, each sediment sample to be used for eDNA analyses was homogenized and sub-sampled into 5-mL tubes. One liter of each water sample (i.e., a total of 3 L per site) was filtered on two 0.45 µm pore-size cellulose nitrate membranes using a peristaltic pump within 24-h of collection. All samples were kept frozen until DNA extraction. Environmental DNA analyses were performed in a dedicated facility at Macquarie University, Sydney. Water eDNA was extracted from filter papers using PowerWater kits while sediment DNA was extracted from ~0.5 g of sediment using DNeasy PowerSoil kits.

The following primer pairs have been used for targeting four different taxonomic groups. Eight base pairs (bp) tags, each differing by at least 5 nucleotides, were added to the 5’ end of each primer to enable the multiplexing of multiple PCR products into the same library before high-throughput sequencing. PCR amplifications were performed in triplicate. The list of primers used is described below:
-Euka01_F:5'-TGGTGCATGGCCGTTCTTAGT-3' & Euka01_R:5'-CATCTAAGGGCATCACAGACC-3' (18S rDNA V7 region; Hardy et al. 2010) for eukaryotes
-Baci01_F:5'-TCCAGCTCCAATAGCGTA-3' & Baci01_R:5'-AACACTCTAATTTTTTCACAGTA-3' (18S nuclear rDNA V4 region; this study) for diatoms
-Tele01_F:5'-ACACCGCCCGTCACTCT-3' & Tele01_R::5'-CTTCCGGTACACTTACCATG-3' (12S mitochondrial rDNA; Valentini et al. 2016) for fish
-Crust16S_F:5'-GGGACGATAAGACCCTATA-3' & Crust16S_R:5'-ATTACGCTGTTATCCCTAAAG-3' (16S rRNA; Berry et al. 2017) for crustacea

After PCR, amplicons were pooled and purified to generate one library per marker (i.e. per primers pair). Library preparation and paired-end sequencing were performed at the Ramaciotti Centre for Genomics (University of New South Wales, Sydney, Australia). The eukaryote library was sequenced separately on an Illumina MiSeq platform (2 × 250bp paired-end) while those for other groups were sequenced collectively on an Illumina NextSeq 100 (2 × 150bp paired-end).

Raw sequencing data were first curated using OBITools v.1.2 (Boyer et al. 2016) and Sumaclust software (Mercier et al. 2013). Additional filtering steps were performed in R version 4.0.3 using custom-made scripts. The scripts are provided in this repository and the procedure is described in detail in Pansu et al. 2025 (Environmental DNA). Filtered datasets along with scripts for statistical analyses are also provided.