eDNA metabarcoding of avocado flowers: ‘Hass’ it got potential to survey arthropods in food production systems?
Data files
Feb 13, 2024 version files 2.74 GB
-
bc1_fwh.txt
2.67 KB
-
bc2_fwh.txt
5.04 KB
-
bc3_ins16s_redo.txt
7.36 KB
-
MSRun460-FTP348_S1_L001_R1_001.fastq.gz
2.74 GB
-
README.md
2.97 KB
Abstract
In the face of global biodiversity declines, surveys of beneficial and antagonistic arthropod diversity as well as the ecological services that they provide are increasingly important in both natural and agro-ecosystems. Conventional survey methods used to monitor these communities often require extensive taxonomic expertise and are time-intensive, potentially limiting their application in industries such as agriculture, where arthropods often play a critical role in productivity (e.g. pollinators, pests and predators). Environmental DNA (eDNA) metabarcoding of a novel substrate, crop flowers, may offer an accurate and high throughput alternative to aid in the detection managed and unmanaged arthropod taxa (e.g. flower-visiting insects and potential pollinators). Here, we compared the arthropod communities detected with eDNA metabarcoding of flowers, from an agricultural species (Persea americana - ‘Hass’ avocado), with two conventional survey techniques; Digital Video Recording (DVR) devices and pan traps. In total, 80 eDNA flower samples, 96 hours of DVRs and 48 pan trap samples were collected. Across the three methods, 49 arthropod families were identified, of which 12 were unique to the eDNA dataset. Alpha diversity levels did not differ across the three survey methods although taxonomic composition varied significantly, with only 12% of arthropod families found to be common across all three methods. This study demonstrates that eDNA metabarcoding of flowers to detect visiting arthropods, although in a developmental stage, can complement traditional survey methods and increase the diversity of taxa detected with implications for both natural and agro-ecosystems.
README
eDNA metabarcoding of avocado flowers: Hass it got potential to survey arthropods in food production systems?
Dataset contains 80 inflorescence samples collected from Marron Brook Farm in south west Western Australia. These inflorescences were extracted using the DNeasy Blood and Tissue Kit to collect insect eDNA and targeted using two PCR assays: fwhF2/fwhR2n (Vamos et al., 2017), targeting the CO1 gene, herein referred to as CO1, and assay Ins_16S_shortF/Ins_16S_shortR (Clarke et al., 2014), targeting the 16S ribosomal subunit gene, herein referred to as 16S. The forward primer sequence for CO1 was 5-GGDACWGGWTGAACWGTWTAYCCHCC-3 and reverse primer sequence 5-GTRATWGCHCCDGCTARWACWGG-3. The forward sequence for 16S was 5-TRRGACGAGAAGACCCTATA-3 and reverse sequence 5- ACGCTGTTATCCCTAAGGTA-3. Amplicons for each assay were ~205 bp and ~167 bp for CO1 and 16S, respectively.
Description of the data and file structure
Three barcode files; two for CO1 and one for 16S.
NTC = Non-Template Control.
Samples starting with M are the inflorescence samples (N = 80).
HYB and EXT both equal extraction controls.
Sharing/Access information
NA
Code/Software
Sequenced multiplex identifier-tagged amplicons were inputted to a containerised workflow (eDNAFlow; Mousavi-Derazmahalleh et al., 2021) and run through the Pawsey Supercomputing Centre in Kensington, Western Australia. Here, the sequences were filtered, formed into Zero-radius Operational Taxonomic Units (ZOTUs) and assigned taxonomic identifications. Sequences were quality checked using FASTQC (Andrews, 2010) and quality filtered (Phred quality score < 20), before the multiplex identifiers were trimmed from the sequence reads using AdapterRemoval v2 (Schubert, Lindgreen, & Orlando, 2016). Subsequently, the filtered reads were demultiplexed using OBITOOLS (Boyer et al., 2016) and sequences shorter than the minimum length of 120 bp were filtered out. Sequences were then dereplicated into ZOTUs with a minimum sequence abundance of 5 (see van der Heyde et al., 2020) using the USEARCH Unoise3 algorithm (Edgar, 2016). A database of ZOTUs was then generated and queried against the GenBank (NCBI) nucleotide database with 100% query coverage and 95% identity using BLASTN (Altschul, Gish, Miller, Myers, & Lipman, 1990). Erroneous ZOTUs with a sequence similarity below the 95% threshold were removed using the LULU post clustering curation method (Frslev et al., 2017). Finally, a custom Python script (eDNAFlow; Mousavi-Derazmahalleh et al., 2021) was used to assign taxonomic identifications to the curated ZOTUs using the Lowest Common Ancestor (LCA) approach. Taxonomic identification was assigned to a ZOTU when the percentage identity of two or more queried sequences with 1% difference had 100% query coverage and 97% sequence similarity. For the purposes of this study, we set the minimum threshold count of 5 reads for ZOTUs to classify a taxa as present within a sample.
Methods
For this study, inflorescences were collected from a Persea americana (‘Hass’ Avocado) orchard, Marron Brook Farm (34°18′52 S, 116°08′36 E), located in the avocado production region of Manjimup-Pemberton in south-west Western Australia (SWWA) (Mccarthy & McCauley, 2020). DVRs and pan trap sampling were carried out at the same time that inflorescence were collected from the study orchard. For each sample tree, ten P. americana inflorescences were removed for eDNA analysis during the peak P. americana flowering season in 2020 (October 30th and 31st). For eDNA analysis, five inflorescences were collected from both the upper (> 2 m) and lower canopy (< 2 m) of each P. americana tree (N = 10 inflorescences per tree, N = 80 inflorescence total) between the 30th and 31st of October 2020, both days were sunny with low winds and no rain. Six inflorescences were collected from each tree on the 30th of October (three upper canopy, three lower canopy) and four inflorescences (two upper canopy, and two lower canopy) were collected from each tree on the 31st of October. To minimise sampling bias, inflorescences were sampled randomly from both the upper and lower canopy while walking around the full circumference of each P. americana study tree (collection method adapted from Howlett et al. 2018). Inflorescences were removed from the lower canopy using sterilised hand secateurs, and inflorescences in the upper canopy were sampled using a net covered with a clean plastic bag replaced after each sample, and sterilised extended secateurs. To minimise cross-contamination, all equipment was sprayed with 10% bleach solution and wiped down after each inflorescence was collected. Once removed, each inflorescence was placed into a thick plastic bag, zip-tied, and kept on ice until the samples could be stored at – 20°C. Frozen inflorescences were processed in the TrEnD laboratory at Curtin University. For inflorescences processing, open florets of each inflorescences were removed with doubled-gloves (changed after every inflorescence) and placed into a mortar and pestle where the plant material was ground into a fine paste. Mortars and pestles were soaked in 10% bleach solution, rinsed with reverse osmosis water, and placed in a UV oven for 15 mins to prevent cross-contamination between samples. In total, 140 –190 mg of ground material was weighed out and transferred into a 2 ml safe-lock Eppendorf tube with 540 μl ATL buffer and 60 μl Proteinase K (QIAGEN, Venio, Limburg, Netherlands). Samples were digested in a slow rotating hybridisation oven at 56°C overnight (~12 hrs). Following digestion, DNA was extracted using the DNeasy Blood and Tissue Kit (QIAGEN, Venio, Limburg, Netherlands) using a QIAcube Connect automated DNA extraction platform (QIAGEN, Venio, Limburg, Netherlands). The final elution volume was 100 μl, and extraction controls (blanks) were carried out for every batch of DNA extractions.
Quantitative Polymerase Chain Reaction (qPCR) (Applied Biosystems, USA) was used to assess the quality of each eDNA sample targeting the Cytochrome Oxidase 1 (CO1) and 16S ribosomal subunit genes. Inhibitors in the PCR reactions and low copy number can impact metabarcoding data (Murray, Coghlan, & Bunce, 2015; Murray, Bunce, Cannell, Oliver, & Houston, 2011), therefore each eDNA extract was assessed with a qPCR dilution series (neat, 1/10, 1/100) under the following conditions: 25 μl reaction volumes containing 2.5 μl of 10 × PCR Gold Buffer, 2 μl of 2.5 mM MgCl2, 1 μl of 0.4 mg/ml BSA, 0.25 μl of dNTPs, 0.5 μl of each primer, 0.2 μl AmpliTaq Gold, 2 μl of DNA and the remaining volume supplemented with DNase/RNase-Free Distilled Water. Two PCR assays were used: assay fwhF2/fwhR2n (Vamos et al., 2017), targeting the CO1 gene, herein referred to as CO1, and assay Ins_16S_shortF/Ins_16S_shortR (Clarke et al., 2014), targeting the 16S ribosomal subunit gene, herein referred to as 16S. The forward primer sequence for CO1 was 5’-GGDACWGGWTGAACWGTWTAYCCHCC-3’ and reverse primer sequence 5’-GTRATWGCHCCDGCTARWACWGG-3’. The forward sequence for 16S was 5’-TRRGACGAGAAGACCCTATA-3’ and reverse sequence 5’- ACGCTGTTATCCCTAAGGTA-3’. Amplicons for each assay were ~205 bp and ~167 bp for CO1 and 16S, respectively. Extracts were amplified on a StepOnePlus Real-Time PCR System (Applied Biosystems, Massachusetts, USA) under the following conditions for CO1: initial denaturation at 95°C for 5 min, followed by 50 cycles of 30s at 95°C, 50°C for 30s and 2 min at 72°C, with a final extension for 10 min at 72°C. For 16S the conditions were as follows: initial denaturation at 95°C for 5 min, followed by 50 cycles of 30s at 95°C, 51°C for 30s, and 45s at 72°C, with a final extension for 10 min at 72°C. Extraction and non-template controls were included in each qPCR assay. DNA extracts that showed inhibition were diluted using MilliQ water and the optimum quantity of DNA input was determined for fusion tagging.
Environmental DNA that were of sufficient quality and free of inhibition, as determined from the initial qPCR screen (qPCR dilution series), were assigned a unique (6 – 8 bp in length) multiplex identifier tag (MID-tag) for both the CO1 and 16S assays. To reduce the likelihood of contamination, chimera production, and MID-tag jumping (Esling, Lejzerowicz, & Pawlowski, 2015), DNA was amplified in a single round of qPCR for each assay using MID-tag primers consisting of either the CO1 or 16S primers coupled to Illumina flow cell adaptors, custom sequencing primers and MID-tag combinations unique to this study. All fusion-tagged qPCR reactions were prepared in dedicated clean room facilities at the TrEnD Laboratory, Curtin University designed for ancient DNA work using an automated QIAgility robotics platform (QIAGEN, Venio, Limburg, Netherlands) and were carried out in 25 μl reactions containing 2.5 μl of 10 × PCR Gold Buffer, 2 μl of 2.5 mM MgCl2, 1 μl of 0.4 mg/ml BSA, 0.25 μl of dNTPs, 0.5 μl of each primer, 0.2 μl AmpliTaq Gold, 4 – 8 μl of DNA and the remaining volume supplemented with DNase/RNase-Free Distilled Water. MID-tag PCR amplicons were carried out in duplicate reactions to control for PCR stochasticity and all fusion-tagged qPCRs were processed using the same parameters as the initial qPCR screens described above.
Replicate MID-tag amplicons were pooled at approximately equimolar concentrations (e.g. minipool) based on their respective qPCR DRn values and were measured under a high-resolution capillary electrophoresis system (QIAxcel; QIAGEN Venio, Limburg, Netherlands) and the final library was size-selected (160 – 425 bp) using a PippinPrep (Millennium Science Pty Ltd, Australia) with a 2% ethidium bromide cassette (Sage Science, Beverly, USA) to remove any off-target amplicons and primer dimer. The final library was purified using the QIAquick PCR Purification Kit (QIAGEN Venio, Limburg, Netherlands), quantified using a Qubit 4.0 Fluorometer (Invitrogen, Carlsbad, USA), and diluted to 2 nM prior to sequencing. Sequencing by synthesis was performed on an Illumina MiSeq platform (Illumina, San Diego, USA) located in the Trace and Environmental DNA lab at Curtin University and as per Illumina’s protocol for single-end sequencing with a 300 cycle MiSeq®V2 reagent kit and standard flow cell for environmental metabarcoding.
Sequenced multiplex identifier-tagged amplicons were inputted to a containerised workflow (eDNAFlow; Mousavi-Derazmahalleh et al., 2021) and run through the Pawsey Supercomputing Centre in Kensington, Western Australia. Here, the sequences were filtered, formed into Zero-radius Operational Taxonomic Units (ZOTUs), and assigned taxonomic identifications. Sequences were quality checked using FASTQC (Andrews, 2010) and quality filtered (Phred quality score < 20) before the multiplex identifiers were trimmed from the sequence reads using AdapterRemoval v2 (Schubert, Lindgreen, & Orlando, 2016). Subsequently, the filtered reads were demultiplexed using OBITOOLS (Boyer et al., 2016) and sequences shorter than the minimum length of 120 bp were filtered out. Sequences were then dereplicated into ZOTUs with a minimum sequence abundance of 5 (see van der Heyde et al., 2020) using the USEARCH Unoise3 algorithm (Edgar, 2016). A database of ZOTUs was then generated and queried against the GenBank (NCBI) nucleotide database with 100% query coverage and 95% identity using BLASTN (Altschul, Gish, Miller, Myers, & Lipman, 1990). Erroneous ZOTUs with a sequence similarity below the 95% threshold were removed using the LULU post clustering curation method (Frøslev et al., 2017). Finally, a custom Python script (eDNAFlow; Mousavi-Derazmahalleh et al., 2021) was used to assign taxonomic identifications to the curated ZOTUs using the Lowest Common Ancestor (LCA) approach. Taxonomic identification was assigned to a ZOTU when the percentage identity of two or more queried sequences with ≤ 1% difference had 100% query coverage and 97% sequence similarity. For the purposes of this study, we set the minimum threshold count of 5 reads for ZOTUs to classify a taxa as present within a sample.