The influence of intraspecific sequence variation during DNA metabarcoding: A case study of eleven fungal species
Estensmo, Eva Lena et al. (2020), The influence of intraspecific sequence variation during DNA metabarcoding: A case study of eleven fungal species, Dryad, Dataset, https://doi.org/10.5061/dryad.h18931zjq
DNA metabarcoding has become a powerful approach for analyzing complex communities from environmental samples, but there are still methodological challenges limiting its full potential. While conserved DNA markers, like 16S and 18S, often are not able to discriminate among closely related species, other more variable markers – like the fungal ITS region, may include considerable intraspecific variation, which can lead to over-splitting of species during DNA metabarcoding analyses. Here we assess the effects of intraspecific sequence variation in DNA metabarcoding, by analyzing local populations of eleven fungal species. We investigated the allelic diversity of ITS2 haplotypes using both Sanger sequencing and high throughput sequencing (HTS) coupled with error correction with the software dada2. All focal species, except one, included some level of intraspecific variation in the ITS2 region. Overall, we observed a high correspondence between haplotypes generated by Sanger sequencing and HTS, with the exception of a few additional haplotypes detected using either approach. These extra haplotypes, often occurring in low frequencies, were likely due to PCR and sequencing errors or intragenomic variation in the rDNA region. The presence of intraspecific (and possibly intragenomic) variation in ITS2 suggest that haplotypes (or ASVs) should not be used as basic units in ITS-based fungal community analyses, but an extra clustering step is needed to reach species-level resolution.
DNA was extracted from 11 fungal species, DNA metabarcoding using the gITS7 and ITS4 primers, amplicons normalised and paired-end sequenced on Illumina Miseq (2 *300 bp). The bio-informatics pipeline is provided in the files, together with the raw data, statististics codes and and the intermediate files needed to re-run all analyses. "The rawdata available in the directory ""Issakka-ITS_rawdata".
A Readme file ("Dryad_ReadMe_Issakka_ITSClustering") is provided and is accessible upon download of the data.
Data submission accompanying the article "The influence of intraspecific sequence variation during DNA metabarcoding: A case study of eleven fungal species"
The data package is structured 2 directories:
1. The raw sequencing data "Issakka-ITS_rawdata".
2 ITSClustering which contains 3 folders:
2.1 "Bioinformatics", contains the two map files "map_lib1.txt and "map_lib2.txt" for demultiplexing the two ITS MiSeq libraries (in Folder_name = "rawdata_metabarcodingITS2"), the file "batchfileDAD2" for running DADA2, the file "bioinformatics_ITS2Clustering" indicating the bioinformatics pathway for generating ASVs and the file "NamedHost_ASV.xlsx" is the OTU table, including the mock community and the 12 replicates.
2. 2 "Sanger_sequences" contains both fasta and ABI formats of the forward and reverse raw Sanger sequences, together with the matching file (xls) allowing to link sequence ID to sample ID.
2.3 "Final_alignments" contains the combined haplotypes from ASVs and Sanger sequencing. For each species a text file is provided. The zip folder "Haploype_network" contains individual files (nexus format) used to generate the network in PopArt.
Norges Forskningsråd, Award: 254746
Universitetet i Oslo