Data from: Detection of vertebrates from natural and artificial inland water bodies in a semi-arid habitat using eDNA from filtered, swept and sediment samples
Data files
Apr 06, 2023 version files 5.31 GB
Abstract
Climate warming will impact the sustainability of arid and semi-arid zone environments so we need to understand the influence of changes in arid lands on vertebrate populations. However, biomonitoring and biodiversity assessment in arid environments can be prohibitively time-consuming, expensive, and logistically challenging due to their often remote and inhospitable nature. Sampling of environmental DNA (eDNA) coupled with high-throughput sequencing is an emerging biodiversity assessment method. Here we explore the application of eDNA metabarcoding and various sampling approaches to estimate vertebrate richness and assemblage at human-constructed and natural water sources in a semi-arid region of Western Australia. Three sampling methods: sediment samples, filtering through a membrane with a pump, and membrane sweeping in the water body, were compared using two eDNA metabarcoding assays, 12S-V5 and 16smam, for 120 eDNA samples collected from four gnammas (gnamma: Australian Indigenous Noongar language term – granite rock pools) and four cattle troughs in the Great Western Woodlands, Western Australia. We detected higher vertebrate richness in samples from cattle troughs and found differences between assemblages detected in gnammas (more birds and amphibians) and cattle troughs (more mammals, including feral taxa). Total vertebrate richness was not different between swept and filtered samples, but all sampling methods yielded different assemblages. Our findings indicate that eDNA surveys in arid lands will benefit from collecting multiple samples at multiple water sources to avoid underestimating vertebrate richness. The high concentration of eDNA in small, isolated water bodies permits the use of sweep sampling which simplifies sample collection, processing, and storage, particularly when assessing vertebrate biodiversity across large spatial scales.
Methods
Study sites
Fifteen samples were collected from each of four gnammas and four cattle troughs (n = 120; see below), located in a 96 km2 area of the Fraser Range, 380 km NNE of Esperance, Western Australia. The study site has a semi-arid climate with 268 mm of annual rainfall occurring year-round. Vegetation communities are dominated by Eucalyptus woodlands and Acacia shrublands and herb lands. The Fraser Range is within the Great Western Woodlands (GWW), the world’s largest semi-arid to arid woodland (Newbey et al., 1984), which covers 16 million hectares and supports an exceptional native flora (>3,300 plant species) and fauna (49 native mammal, 11 feral mammal, 138 reptile, 14 frog, and 215 bird species; Department of Parks and Wildlife, 2013; Fox et al., 2016).
Sample collection
Water samples, sediment samples, and samples from membranes swept through the water body were collected in May 2021 (late Autumn) from the four gnammas and four cattle troughs. Five replicate samples of each type were taken from each water source. Gnammas ranged from 30 cm to 300 cm in width, and 20 cm to 100 cm in depth. Cattle troughs were 130 cm wide and 60 cm deep. Five 50 mL water samples were collected into falcon tubes at five random locations in each water source, at a depth of 5-15 cm from the surface. Five Supor 47 mm 0.45 µm pore-size filter membranes (Pall Corporation, Port Washington, USA) were submerged at a depth of 5-15 cm at five random locations from each water source, and ‘swished’ around for 15 seconds before being placed immediately into individual zip-lock bags. Finally, five randomly chosen sediment samples were scooped from the bottom of each water source into a 50 mL falcon tube. Disposable gloves were worn during sampling and were changed between every sample at each location. Samples were placed on ice immediately after collection. Water samples (50 mL) were filtered within 24 hours through a Supor 47 mm 0.45 µm pore-size filter membrane (Pall Corporation) using a Sentino Microbiology Pump (Pall Corporation). Equipment was decontaminated between samples in a 10% bleach solution for 10 minutes and rinsed thoroughly before subsequent use. Two filtering control samples were taken by pumping 500 mL of local tap water through filter membranes. All samples were kept on ice from the time of collection or processing and were frozen at -20°C at Curtin University within 72 hours of collection.
Sample processing and DNA extraction
Prior to DNA extraction, samples were defrosted in a refrigerator (4°C) overnight. DNA was extracted from filter membranes using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Valencia, California). Filter membranes were cut up into 2 mm wide strips and incubated in 540 mL buffer ATL and 60 µL proteinase K at 56°C for 3 hours. Sediment samples were homogenised using a TissueLyser (Qiagen) and DNA was extracted from 250 mg of sediment using the DNeasy PowerLyser Powersoil Kit (Qiagen), which contains steps to remove inhibitors from the extracts. Samples were then extracted using the QIAcube automated platform (Qiagen) and eluted to 100µl using the manufacturer's protocol. DNA extracts were immediately frozen at -20℃. DNA extraction blanks (negative controls) were processed with each batch of 30 samples (n = 4) using the extraction reagents only.
Assessment of DNA extracts
Due to the degraded nature of eDNA, metabarcoding primers typically target short barcode regions to improve amplification success (Yu et al., 2012). The primers used were the 12Sv5-F/R targeting the mitochondrial 12S gene (F: 5’-TAGAACAGGCTCCTCTAG-3′; R: 5’-TTAGATACCCCACTATGC-3′, ~98bp (Riaz et al., 2011) and the mammal specific primers 16Smam1/2 targeted the mitochondrial 16S ribosomal gene (F: 5’-CGGTTGGGGTGACCTCGGA-3′; R: 5’-GCTGTTATCCCTAGGGTAACT-3′, ~135bp (Taylor, 1996). Both assays target regions conserved across vertebrates (12S-V5) or mammals (16Smamm) and can be recovered from degraded DNA (Kitano et al., 2007; Sarri et al., 2014; Staats et al., 2016). Quantitative polymerase chain reaction (qPCR) was used to detect the quality and quantity of DNA in each extract, and extract and verify the optimum DNA input for metabarcoding (Murray, Coghlan, & Bunce, 2015). Here, qPCR assays were run on all samples using the 12S-V5 F/R primers on the neat extract, 1/5 and 1/10 dilution to screen for PCR inhibitors in the reaction. The extraction of sediment samples included inhibitor removal steps because those tend to be more abundant in sediments but all sample types were screened. The majority of water/passive samples were found to have an optimum dilution in the realm of neat, 1/5 or 1/10, whereas the majority of sediment samples had to be diluted to 1/100. Using the optimum dilutions determined by qPCR with 12S-V5 primers, a supplementary qPCR assay was run on a subset of 20 samples to determine if the 16smam forward/reverse primers were as effective at amplifying DNA at these dilutions. In some instances, primer bias can differentially amplify eDNA at sites with different community composition (Aird et al., 2011). To mitigate these effects the combination of a vertebrate primer and a specific mammal primer were used. In addition, 12S and 16S are broadly used for mammal detection in eDNA metabarcoding studies and there is a greater availability of reference sequences available online for taxonomic assignment (Deagle et al., 2014 and Valentini et al., 2016).
The polymerase chain reaction (PCR) mix for quantification contained: 2.5 mM MgCl2 (Applied Biosystems), 10 × PCR Gold buffer (Applied Biosystems), 0.25 mM dNTPs (Astral Scientific), 0.4 mg/mL bovine serum albumin (Fisher Biotec), 0.4 μmol/L forward and reverse primer, 1 U AmpliTaq Gold DNA polymerase (Applied Biosystems) and 0.6 μL of a 1:10,000 solution of SYBR Green dye (Life Technologies). All PCR amplification was conducted on a StepOne Plus (Applied BioSystems) real-time qPCR instrument with the following conditions: 5 min at 95°C, 50 cycles of 95°C for 30 s, 30 s at the annealing temperature (58°C) and 45 s at 72°C, followed by 15 s at 95°C, 1 min at 60°C, and 15 s at 95°C during the melt curve stage, ending with 10 min of elongation at 72°C. Contamination was minimised by preparing the PCR mixes in a dedicated clean lab, and then adding DNA extract in a separate lab, inside specialised UV hoods.
DNA amplification and sequencing
Based on qPCR results, fusion tagging was performed on samples that contained adequate amplifiable DNA by assigning each sample a unique combination of fusion tag primers. Each fusion tag primer combination contained a unique multiplex identifier between 6 and 8-bp in length, Illumina’s sequencing adaptors (i.e. P5 and P7) and the gene specific primer (described above). A single-step fusion protocol was carried out with unique index combinations before qPCR was used to generate amplicons of each fusion-tagged sample using the same reagents and cycling conditions as described above. The fusion-tagged amplicons were generated in duplicates for each biological replicate to maximise amplicon numbers for sequencing and reduce the chances of non-detections and the effects of PCR stochasticity (Murray, Coghlan, & Bunce, 2015). PCR replicates were then pooled, amplicons cleaned using the QIAquick PCR Purification Kit (Qiagen) and then quantified using the QIAxcel Advanced System (Qiagen). Based on this quantification, the DNA library for sequencing was made from pools, combined in approximately equal concentrations. A Pippin Prep (Sage Science) was used to size-select the amplicons in this library, and the library was then cleaned using the QIAquick PCR Purification Kit (Qiagen). A Qubit fluorometer (Thermo Fisher Scientific) was used to quantify the final DNA library, before sequencing as per Illumina’s sequencing protocols for single-end sequencing, using Illumina’s single direction MiSeq 300 V2 Reagent Kit on the Illumina MiSeq platform (Illumina, USA).
Sequence analysis and taxonomic assignment
Using a high-performance computing cluster (Pawsey supercomputing centre; Perth, WA, Australia), sequences were analysed with the eDNAFlow automated pipeline (Mousavi-Derazmahalleh et al., 2021), which performed the following tasks: sequence quality was checked with FASTQC (Andrews, 2010) and filtered with AdapterRemoval v2 (Schubert, Lindgreen, & Orlando, 2016) for Phred quality score lower than 20 and trimming sequences with Ns as enforced in eDNAFlow by --trimns and --trimqualities parameters. Remaining trimmed sequences were demultiplexed and sequences smaller than expected minimum amplicon length were trimmed (12S-V5 minimum length 50-bp; 16smam minimum length 25-bp) using OBITools’ ngsfilter and obigrep tools respectively (Boyer et al., 2016). Unique sequences, zero-radius operational taxonomic units (ZOTUs-denoised sequences) and an abundance table were generated using the USEARCH (Edgar, 2010) commands fastx-uniques, unoise3, (minsize 8) and otutab, respectively. The ZOTUs generated from both assays were queried against the nucleotide database Genbank (https://www.ncbi.nlm.nih.gov/genbank/) in October 2021 using the following parameters in Basic Local Alignment Search Tool (BLASTN) (Altschul et al., 1990): perc_identity ≥ 90, evalue ≤ 1e−3, best_hit_score_edge 0.05, best_hit_overhang 0.25, qcov_hsp_perc 100, max_target_seqs = 10. Taxonomic identification was then assigned with more strict parameters using the eDNAFlow Lowest Common Ancestor (LCA) script (Mousavi-Derazmahalleh et al., 2021) with a minimum percentage identity ( %identity) of ≥95, and if a ZOTU had multiple blast assignments where the difference between their %identity was equal to or smaller than 1, then that ZOTU was assigned to the nearest common taxonomic level otherwise a species level assignment was returned.
The results of the LCA script were compared against existing taxonomic data for the sampling area (Department of Parks and Wildlife, 2013), and if necessary, manually curated to ensure that the taxa assigned are known to occur in the sampling area. Whilst the majority of ZOTUs were assigned to species level, the LCA script dropped a few ZOTUs to the nearest common taxonomic level (e.g. Meliphagidae sp.). Where ZOTUs were assigned to taxa that are not local to the sampling area, the ZOTUs were either reassigned to sister species, e.g. black-flanked rock-wallaby (Petrogale lateralis) and Australian shelduck (Tadorna tadornoides) known to occur in the area, or dropped to the nearest common taxonomic level that currently exists in the sampling area, e.g. vesper bats, (Vespertilionidae sp.) i.e. ZOTUs were manually assigned to locally known species if all other species for that ZOTU were exotic to the sampling area: Australian magpie (Gymnorhina tibicen), yellow-throated miner (Manorina flavigula), Australian magpie-lark (Grallina cyanoleuca), euro (Osphranter robustus) and sheep (Ovis aries). In cases where ZOTUs were assigned to family level and multiple exotic species were attributed to a ZOTU with 100% identity, the ZOTUs were reassigned to generic rank of species known to be native to the sampling area (e.g. Corvus sp.). All ZOTUs assigned to “chordata environmental samples” were removed from the data set, and all ZOTUs assigned to exotic canid spp. (i.e. Nyctereutes viverrinus, Canis lupus rufus, Canis lupus)” were reassigned to Canis lupus familiaris. ZOTUs assigned to exotic “Artiodactyla spp.” (i.e. Bison bonasus, Bos javanicus, Bos mutus, Cephalophus dorsalis, Muntiacus sp., Pudu puda, Syncerus caffer, Tragelaphus eurycerus) were reassigned to domestic cow (Bos taurus).
Statistical analysis
We generated a presence-absence matrix for the two assays (12S-V5 and 16smam) for both sources (gnammas and cattle troughs) and all sampling methods (swept, filtered and sediments). This matrix was used to calculate taxon richness for each sample. The effect of source and sampling method was evaluated using a two-way analysis of variance (ANOVA) achieved using StatistiXL v 2.0 (www.statistiXL.com). The presence-absence matrix was also used to examine differences in biological assemblages identified from the various sources and sampling methods, using the PERMANOVA+ software add-on in PRIMER7 (Clarke & Gorley, 2015). To avoid unassigned resemblance values, two gnamma-sediment samples (GS1_3 and GS2_4) and one gnamma-filter sample (GW2_2) for which no taxa were detected were removed from the matrix, before a Jaccard distance matrix was generated between samples. Using these data a PERMANOVA was run using Type III sums of squares, unrestricted permutation of raw data and significance determined by 9999 permutations of the pseudo-F statistic (Clarke & Gorley, 2015). PRIMER7 was used to visualise and estimate pair-wise tests to determine how the sources and sampling methods compared using non-metric multidimensional scaling (nMDS).
We used the “BiodiversityR” (Kindt & Coe, 2005) and “drc” packages (Knezevic, Streibig, & Ritz, 2017) with R 3.5.1 (R Core Team, 2018) to generate accumulation curves for taxa at each site and with each sampling method. Asymptotic regression rarefaction curves were generated for each sampling method and models were visualised using the package “ggplot2” (Wickham & Sievert, 2016; Chiarucci, Bacaro, Rocchini, & Fattorini, 2008). Curves represent the order-free accumulation of mean taxa detections calculated from random permutations of all possible orderings of taxa detections. We then used an EcoTest (Cayuela, Gotelli, & Colwell, 2015) to statistically compare rarefaction curves representing each sampling method for each source respectively, testing the null hypothesis that three samples were drawn from a single assemblage and any differences in their rarefaction curves reflect only sampling effects. Using the package “rareNMtests” (Cayuela, Gotelli, & Colwell, 2015) the abundances of all taxa detections were summed to generate a pooled composite curve. The test statistic (Z) was calculated from the cumulative summed areas of difference between the three individual curves representing each sampling method and the composite curve. The observed value of this Z-statistic was compared to a null model distribution constructed from rarefaction curves generated from 200 random permutations of all possible orderings of taxa detections in each set of replicates across the three sampling methods and significance assessed at α = 0.05. A “leave one out” analysis was used to subsequently determine how z-values and significance changed when sampling methods were individually omitted from the analysis, after Cayuela, Gotelli, & Colwell (2015).