Long- and short-read metabarcoding technologies reveal similar spatio-temporal structures in fungal communities

Furneaux, Brendan 1 ; Bahram, Mohammad2; Rosling, Anna1; Yorou, Nourou3; Ryberg, Martin1

Published Mar 22, 2021 on Dryad. https://doi.org/10.5061/dryad.6wwpzgmvf

Abstract

Fungi form diverse communities and play essential roles in many terrestrial ecosystems, yet there are methodological challenges in taxonomic and phylogenetic placement of fungi from environmental sequences. To address such challenges we investigated spatio-temporal structure of a fungal community using soil metabarcoding with four different sequencing strategies: short amplicon sequencing of the ITS2 region (300--400\ bp) with Illumina MiSeq, Ion Torrent Ion S5, and PacBio RS II, all from the same PCR library, as well as long amplicon sequencing of the full ITS and partial LSU regions (1200--1600\ bp) with PacBio RS II. Resulting community structure and diversity depended more on statistical method than sequencing technology. The use of long-amplicon sequencing enables construction of a phylogenetic tree from metabarcoding reads, which facilitates taxonomic identification of sequences. However, long reads present issues for denoising algorithms in diverse communities. We present a solution that splits the reads into shorter homologous regions prior to denoising, and then reconstructs the full denoised reads. In the choice between short and long amplicons, we suggest a hybrid approach using short amplicons for sampling breadth and depth, and long amplicons to characterize the local species pool for improved identification and phylogenetic analyses.

Methods

Sequence data are derived from metabarcoding of soil samples from the Forêt Classée de l'Ouémé Supérieur in Benin, West Africa. Demultiplexed and trimmed raw reads are deposited in the European Nucleotide Archive under project PRJEB37385.

The full analysis pipeline is published in a public Github respository at http://www.github.com/oueme-fungi/oueme-fungi-transect, and a snapshot of that repository is included here (oueme-fungi-transect.tar.gz).

After demultiplexing, the reads were split into the homologous domains ITS1-5.8S-ITS2-LSU1-D1-LSU2-D2-LSU3-D3-LSU4 using the R package LSUx (snapshot linked at Zenodo). Each domain was then denoised independently using the DADA2 package in R, with an error model calibrated on the 5.8S region. Denoised full-length reads were then reassembled from the domains using the new R package tzara (snapshot linked at Zenodo). Full-length amplicon sequence variants (ASVs) were generated by clustering reads by 100% ITS identity and calculating a consensus sequence for all other regions within each cluster. Consensus sequences for each region in each cluster are included in ASVs.zip as .fasta.gz files. Also included are some reconstructed sequences: ITS (ITS1-5.8S-ITS2), LSU (LSU1-D1-LSU2-D2-LSU3-D3-LSU4), 32S (5.8S-ITS2-LSU), long (full long amplicons from ITS1+LR5; i.e. ITS-LSU), short (denoised short amplicons from gITS7+ITS4), full (long if available, otherwise short), and best (longest possible sequence made by concatenating successfully denoised regions; in most case equal to long or short). A table of which ASVs were recovered from which samples is included in ASVs.biom (also in ASVs.zip).

The included alignment (decipher_long_unconst.phy) was generated from the "long" ASV sequences using the R package DECIPHER.

The included tree (RAxML_bipartitions.decipher_unconst_long) was then generated from the alignment using RAxML (parameters in RAxML_info.decipher_unconst_long).

Sequences were identified using Unite, Warcup, and RDP-LSU databases reannotated to use a common classification system. Scripts used to reannotate the databases are at http://github.com/brendanf/reannotate, and the reannotated databases are here as fasta.gz files formatted for use by SINTAX from USEARCH/VSEARCH or by the R package DADA2.

Long- and short-read metabarcoding technologies reveal similar spatio-temporal structures in fungal communities

Data files

Abstract

Methods

Usage notes

Works referencing this dataset