The main objective of this work was to develop and validate a robust and reliable ‘from benchtop-to-desktop’ metabarcoding workflow to investigate the diet of invertebrate-eaters. We applied our workflow to fecal DNA samples of an invertebrate-eating fish species. A fragment of the COI gene was amplified by combining two minibarcoding primer sets to maximize the taxonomic coverage. Amplicons were sequenced by an Illumina MiSeq platform. We developed a filtering approach based on a series of non-arbitrary thresholds established from control samples and from molecular replicates in order to address the elimination of cross-contamination, PCR/sequencing errors and mistagging artifacts. This resulted in a conservative and informative metabarcoding dataset. We developed a taxonomic assignment procedure that combines different approaches and that allowed the identification of ~75% of invertebrate COI variants to the species level. Moreover, based on the diversity of the variants, we introduced a semi-quantitative statistic in our diet study, the Minimum Number of Individuals (MNI), which is based on the number of distinct variants in each sample. The metabarcoding approach described in this paper may guide future diet studies that aim to produce robust datasets associated with a fine and accurate identification of prey items.
COI-filtering-DB.fas.tar
Custom COI database used during HTS filtering in order to keep only relevant COI sequences
Taxassign-DB.fas.tar
Custom COI database used for taxonomic assignement. It contains selected COI sequences from BOLD and sequences produced in the lab (MF458551 - MF458851).
Sequences are in fasta format. ID lines contain taxon names and NCBI TaxIDs. If NCBI TaxID is not available for the taxon, the TaxID of its lowest level parent is given.
Perl_scripts.tar
Perl script used for data filtering and taxonomic assignment.
MFZR1_S4_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the first replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
MFZR1_S4_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the first replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
MFZR2_S5_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the second replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
MFZR2_S5_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the second replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
MFZR3_S6_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the third replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
MFZR3_S6_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the third replicate series amplified by MFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR1_S1_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the first replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR1_S1_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the first replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR2_S2_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the second replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR2_S2_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the second replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR3_S3_L001_R1_001.fastq.tar
Fastq file with raw data. Forward sequences of the third replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
ZFZR3_S3_L001_R2_001.fastq.tar
Fastq file with raw data. Reverse sequences of the third replicate series amplified by ZFZR primer pairs. Tag combinations to identify the samples are found in the associated readme file.
alignements_for_phylogeny.tar
Sequence alignments used for constructing phylogenetic trees of variants and their homologues.