OTUs Table and fastq sequences from environmental DNA applied to trematode communities
Douchet, Philippe (2021), OTUs Table and fastq sequences from environmental DNA applied to trematode communities, Dryad, Dataset, https://doi.org/10.5061/dryad.f7m0cfxxz
This OTUs table and fastq sequences underlie the main results of the study "Make visible the invisible: Optimized development of an environmental DNA metabarcoding tool for the characterization of trematode parasitic communities".
In this study, our aim was to develop an optimized eDNA-based metabarcoding approach to detect trematodes and characterize their communities, most of which associated to aquatic environments. We thus assessed the ability of our eDNA-based metabarcoding approach to reconstruct trematode communities compared to a classical trematode monitoring method over four freshwater aquatic ecosystems.
We focused on 4 natural sites from Occitanie Region (Southern France) that differ in terms of habitats, and in which the trematode communities were previously at least partially characterized. At each of these sites, we sampled the water-sediment interface from which the eDNA was extracted and sequenced with a MiSeq amplicons sequencing approach.
Over the four natural ecosystems screened in nature, 33 OTUs were generated from the eDNA-based approach, from which 11 trematode species were identified. In comparison, we identified five trematode species using the classical monitoring method, three of which were also detected by the eDNA-based approach.
A total of 50 individual metabarcoding libraries were prepared following the Illumina two-step PCR protocol. These 50 libraries include triplicates of each water-sediment interface sample collected (i.e., 3 x 5 = 15), four negative controls (one per sampling site) and six positive controls, the PCRs of each of these samples being duplicated (i.e., 25 x 2 = 50). The positive controls consisted into two categories of mock communities. The first category of mock community consisted in equimolar pools of 28 DNA extracts (set at a 3.5×10-3ng/µL final concentration) from different trematode species from internal collections. The second mock community consisted in equimolar pools of PCR products independently obtained from the same 28 trematodes species.
For the OTUs table, The resulting amplicon sequence dataset was processed using the Find Rapidly OTUs with Galaxy Solution (FROGS) pipeline implemented in Galaxy (Escudié et al., 2018) available from the Genotoul platform (Toulouse, France). (i) The amplicon dataset was first pre-processed by filtering out the sequences so as to keep amplicon sizes from 150 to 400 nucleotides. (ii) The sequences kept were next clustered into operational taxonomic units (OTUs) using the swarm algorithm and using denoising and an aggregation distance of three nucleotides (Mahé et al., 2014). (iii) The dataset was filtered out for chimeras using VSEARCH (Rognes et al., 2016). (iv) Singletons and underrepresented clusters (i.e., clusters whose number of sequences were <0.1% of the total number of sequences) were removed. Each OTU was next assigned to a species through a two-step BLAST affiliation procedure. The first BLAST analysis was computed using the standalone blastn program contained in the BLAST+ package and a custom trematode sequence database containing a total of 88 sequences including the sequences obtained from the amplicons generated by the in silico ecoPCR (i.e., 50 species; see Section 2.1 of the article; Table S1), the sequences generated by the in vitro Sanger sequencing (i.e., 26 species over the 34 species sequenced; see Section 2.2; Table S3), and 12 sequences retrieved from the GenBank database (Table S4). The second BLAST analysis was performed using the online MEGABLAST tool without restricting parameters to achieve affiliation of OTUs that could not be assigned in the first BLAST analysis. The obtained OTUs were filtered for presenting minimal blast coverage of 97% and a pairwise identity above 97% with the affiliated sequence. The remaining OTUs were considered as “unassigned”. Lastly, we considered that a given OTU was present in a sub sample (i.e., one of the three replicates of a single environmental sample; see section 2.4) if its number of sequences was >0.1% of the total number of sequences in each of the two library assigned to this sub sample and if this OTU was present in both libraries (i.e., the two PCR replicates performed on the single subsample; see section 2.4). This 0.1% threshold was determined as being the most stringent while allowing the retention of the necessary sequences to detect all the 28 species from the control mock communities.
fastq file name definitions :
- Mo111 to Mo 132 : Mock communities (pool of DNA). The second number indicates the technical replicate and the final number indicates the PCR replicates.
- Mo211 to Mo232 : Mock communities (pool of PCR products). The second number indicates the technical replicate and the final number indicates the PCR replicates.
- NS1 to NN2 : Field negative controls (first letter: N) of Salses (S), St Génis (G), Villelongue (V) and Néfiach (N). The number indicates the PCR replicates.
- S11 to S32 : Salses in May
- TS11 to TS32 : Salses in March
- G11 to G32 : St Génis
- V11 to V32 : Villelongue
- N11 to N32 : Néfiach
For field samples, the first number indicates the sub-sample and the final number indicates the PCR replicates.