Data from: Unravelling large-scale patterns and drivers of biodiversity in dry rivers
Data files
Jul 05, 2024 version files 1.56 GB
-
bact_OTU_097_DRIME_agg_ORDER.txt
732.82 KB
-
CLASSIFICATION_clean_curated.txt
1.44 MB
-
CLASSIFICATION_clean_trimmed.txt
185.33 KB
-
DRIME_bact02.filtered.uniq.fasta
1.27 GB
-
DRIME_euka2.filtered.uniq.fasta
281.64 MB
-
environment_DRIME_ORDER_Bact02_NC.txt
11.83 KB
-
environment_DRIME_ORDER_Euka02_NC.txt
10.81 KB
-
environmental_variables_description.xlsx
10.20 KB
-
euka_OTU_097_DRIME_agg_ORDER.txt
843.31 KB
-
Metabarcoding_DRIME_workflow_Bact02_NC.Rmd
17.37 KB
-
Metabarcoding_DRIME_workflow_Euka02_NC.Rmd
27.50 KB
-
README.md
2.01 KB
Abstract
We conducted a coordinated experiment and a metabarcoding approach on environmental DNA targeting multiple taxa (i.e. Archaea, Bacteria, Fungi, Algae, Protozoa, Nematoda, Arthropoda, and Streptophyta). Dry sediments were collected from 84 non-perennial rivers across 19 countries on four continents to investigate biodiversity patterns and drivers.
https://doi.org/10.5061/dryad.v6wwpzh2j
Sediment samples were collected by an international consortium (http://1000_intermittent_rivers_project.irstea.fr) following a standardized protocol during dry phases in the years 2015-2016. We conducted a metabarcoding approach on environmental DNA targeting multiple taxa (i.e. Archaea, Bacteria, Fungi, Algae, Protozoa, Nematoda, Arthropoda and Streptophyta).
Description of the data and file structure
- DRIME_bact02.filtered.uniq.fasta : de-replicated and de-multiplexed sequencing data for the barcode Bact02 targetting Bacteria and Archaea
- Metabarcoding DRIME workflow Bact02.Rmd : R markdown file describing the bioinformatic processing and the statistical analyses conducted on the Bact02 barcode
- bact_OTU_097_DRIME_agg_ORDER.txt : curated OTU table obtained for the Bact02 barcode
- CLASSIFICATION_clean_trimmed.txt : taxonomic assignation of Bact02 OTUs
- environment_DRIME_ORDER_Bact02_NC.txt : environmental data used in statistical analyses for the Bact02 barcode
Euka02 : folder for the barcode Euka02 targetting Eukaryotes
- DRIME_euka2.filtered.uniq.fasta : de-replicated and de-multiplexed sequencing data for the barcode Euka02 targetting Eukaryotes
- Metabarcoding DRIME workflow Euka02.Rmd : R markdown file describing the bioinformatic processing and the statistical analyses conducted on the Euka02 barcode
- euka_OTU_097_DRIME_agg_ORDER.txt : curated OTU table obtained for the Euka02 barcode
- CLASSIFICATION_clean_curated.txt : taxonomic assignation of Euka02 OTUs
- environment_DRIME_ORDER_Euka02_NC.txt : environmental data used in statistical analyses for the Euka02 barcode
- environmental_variables_description.xlsx: environmental data name, description and units
Code/Software
Code can be run using the OBITools software package and R.
Sample and data collection
Sediment samples were collected by an international consortium (http://1000_intermittent_rivers_project.irstea.fr) following a standardized protocol during dry phases in the years 2015-2016. Specifically, a total of 84 samples were retained in the statistical analyses (see information below), and were collected in 19 countries spanning the main 5 Köppen climate classes (A: Tropical n=2, B: Dry n=14, C: Temperate n=66, D: Continental n=1, E: Polar n=1) (FIG. 1). The length of the reaches sampled was defined as 10 times the average active channel width to cover a representative area and to ensure consistent sampling effort. The active channel was defined as the area of inundated and exposed riverbed sediments between clearly delineated edges of perennial terrestrial vegetation and/or abrupt changes in bank slope. Within each reach, 5% of the riverbed was randomly sampled with 1 m² quadrats to collect a total of 3 L of sediments. Riverbed sediment samples were collected from each quadrat (sediment depth: 0-10 cm) and pooled into a single composite sample per site. In the laboratory, the sediments were sieved (2 mm) and air-dried for one week. For physicochemical analyses, a homogenized subsample of ~160 g was packed air-tight in plastic containers and shipped to one of two laboratories where the analyses were performed. Upon reception, the samples were stored in a dry and dark room until later processing and analysis. For eDNA metabarcoding analyses, a homogenized subsample of ~40 g was packed air-tight in plastic containers and sent to the Laboratoire d’Écologie Alpine (University Grenoble Alpes, France), where the samples were immediately stored at -20 °C before further processing.
Latitude and longitude (WGS84 datum) of the sampling sites were determined in the field with a geographic positioning system (GPS) or later in the laboratory using a geographic information system (GIS). Precipitation (mm) and mean annual temperature (ºC) were estimated based on the WordClim 1.4. database (http://www.worldclim.org/current), which provides 1-km spatial resolution climate surfaces for global land areas over the period 1970-2000. Mean annual potential evapotranspiration (PET) and mean annual aridity were determined using the Global Aridity and PET database published by the Consortium for Spatial Information (CGIARCSI, www.cgiar-csi.org/) using the worldClim.org database. PET is a measure of the ability of the atmosphere to remove water through evapotranspiration and was calculated as a function of mean annual temperature, daily temperature range, and extra-terrestrial radiation over the years 1950-2000. Mean annual aridity was assessed using the aridity index and expressed as precipitation/PET over the years 1950-2000 and multiplied by 10,000 to convert the decimal figures into integers. Aridity index values are high in humid conditions and low in arid conditions. Dry-period duration was estimated either based on logger data or repeated observations every two weeks. River width, riparian canopy cover (visually estimates of the proportion of river reach covered by vegetation), and forest cover within the catchment (%) were estimated in the field during sampling. These local-scale variables (apart from land cover) were recorded in situ by participants of the consortium using a standardized protocol. Land cover was derived using GIS. Organic carbon (C) and total nitrogen (N) contents of sediments (%C and %N, respectively) were determined using elemental analyzers; sediment texture (% sand, silt and clay, as well as mean and median particle size) with a laser diffractometer; and DOC, SRP, and DIN as the sum of extractable ammonium (NH4+) and nitrate (NO3-) using standard analytical methods.
eDNA metabarcoding
Sediment biodiversity was estimated using markers amplifying both Bacteria and Archaea (16S rDNA, Bact02, forward primer: GCCAGCMGCCGCGGTAA, reverse primer: GGACTACCMGGGTATCTAA) and all Eukaryotes (18S nuclear rDNA, Euka02, forward primer: TTTGTCTGSTTAATTSCG, reverse primer: CACAGACCTGTTATTGC). Extracellular DNA was extracted from 10 g of sediment by adding an equivalent volume of saturated phosphate buffer (Na2HPO4; 0.12 M; pH ≈ 8) before agitation for 15 min on an orbital shaker. A 2-mL volume of the resulting suspension was centrifuged at 10,000 g for 5 min, and a 400-µl aliquot of the supernatant was then used as starting material for eDNA extraction using the NucleoSpin Soil extraction kit (Macherey-Nagel) following the manufacturer’s instructions except for the cell lysis step. Blank extraction controls using only phosphate buffer were included in the extraction protocol. PCRs were run in triplicate for each DNA extract and each marker, along with PCR negative controls where the DNA extract was replaced by molecular grade water. The amplification mix consisted of 2x Applied BiosystemsTM MasterMix AmpliTaq GoldTM 360, 0.5 µM of each tagged forward and reverse primer, and 3.2 µg/mL of bovine serum albumin in a final reaction volume of 20 µl, including 2 µl of extracted DNA. PCR conditions for the amplification of the Bact02 marker were as follows: 48 cycles of 30 s at 95°C, 30 s at 53°C and 90 s at 72°C. The 18S marker was amplified using the following conditions: 45 cycles of 30 s at 95°C, 30 s at 45°C and 1 min at 72°C. For each marker, PCR products were visualized by capillary electrophoresis on a QIAxcel (Qiagen). PCR products (including extraction and PCR negative controls) were pooled for each marker and 8 aliquots of 200µL were purified using the MinElute PCR purification kit (Qiagen). Purified products were then pooled before sequencing. Library preparation and sequencing were performed at Fasteris (Geneva, Switzerland) using the Metafast PCR-free protocol (www.fasteris.com/en-us/NGS/DNA-sequencing/Metabarcoding/Metagenomics-16S-18S-ITS-or-custom-PCR-amplicons). High-throughput sequencing of 18S marker was performed on an Illumina HiSeq 2500 platform (2 x 150 bp paired-end reads) while 16S amplicons were sequenced on an Illumina MiSeq (2 x 250 bp paired-end reads) platform.
The sequencing data were curated using the OBITools software package together with custom R scripts. Paired-end reads were assembled based on overlapping 3’-end sequences, assigned to the respective sample/marker and dereplicated. Singletons, sequences shorter than the expected amplicon size and sequences occurring in only 1 PCR replicate were removed before using the obliclean command to remove PCR errors. We formed operational taxonomic units (OTUs) by clustering sequences at 97% of similarity using the Sumaclust algorithm. OTU abundance was defined as the sum of reads sharing these similar sequences. In subsequent analyses, each OTU was represented by its most abundant sequence. Each OTU was assigned a taxonomic clade with the ecotag command, using a set of reference databases built with the ecoPCR software from the EMBL database version 136 to refine taxonomic annotations. Taxonomic annotations with >80% identities were retained. OTUs peaking in abundance in blank extractions or PCR negative controls were considered contaminants and removed from the analysis. PCR replicates with a number of reads and OTUs lower or similar to PCR negative controls were considered as dysfunctional PCRs and also removed from the analysis. Finally, we removed PCR replicates that had a high Bray-Curtis dissimilarity compared to other PCR replicates from the same sample. At the end of this process, PCR replicates were pooled for each site.