Long- and short-read metabarcoding technologies reveal similar spatio-temporal structures in fungal communities
Data files
Mar 22, 2021 version files 46.79 MB
-
ASVs.zip
6.19 MB
-
decipher_unconst_long.phy
3.43 MB
-
RAxML_bipartitions.decipher_unconst_long
41.50 KB
-
RAxML_info.decipher_unconst_long
30.62 KB
-
rdp_train.LSU.dada2.fasta.gz
2.43 MB
-
rdp_train.LSU.sintax.fasta.gz
2.44 MB
-
unite.ITS.dada2.fasta.gz
13.95 MB
-
unite.ITS.sintax.fasta.gz
14.02 MB
-
warcup.ITS.dada2.fasta.gz
2.12 MB
-
warcup.ITS.sintax.fasta.gz
2.13 MB
Abstract
Methods
Sequence data are derived from metabarcoding of soil samples from the Forêt Classée de l'Ouémé Supérieur in Benin, West Africa. Demultiplexed and trimmed raw reads are deposited in the European Nucleotide Archive under project PRJEB37385.
The full analysis pipeline is published in a public Github respository at http://www.github.com/oueme-fungi/oueme-fungi-transect, and a snapshot of that repository is included here (oueme-fungi-transect.tar.gz).
After demultiplexing, the reads were split into the homologous domains ITS1-5.8S-ITS2-LSU1-D1-LSU2-D2-LSU3-D3-LSU4 using the R package LSUx (snapshot linked at Zenodo). Each domain was then denoised independently using the DADA2 package in R, with an error model calibrated on the 5.8S region. Denoised full-length reads were then reassembled from the domains using the new R package tzara (snapshot linked at Zenodo). Full-length amplicon sequence variants (ASVs) were generated by clustering reads by 100% ITS identity and calculating a consensus sequence for all other regions within each cluster. Consensus sequences for each region in each cluster are included in ASVs.zip as .fasta.gz files. Also included are some reconstructed sequences: ITS (ITS1-5.8S-ITS2), LSU (LSU1-D1-LSU2-D2-LSU3-D3-LSU4), 32S (5.8S-ITS2-LSU), long (full long amplicons from ITS1+LR5; i.e. ITS-LSU), short (denoised short amplicons from gITS7+ITS4), full (long if available, otherwise short), and best (longest possible sequence made by concatenating successfully denoised regions; in most case equal to long or short). A table of which ASVs were recovered from which samples is included in ASVs.biom (also in ASVs.zip).
The included alignment (decipher_long_unconst.phy) was generated from the "long" ASV sequences using the R package DECIPHER.
The included tree (RAxML_bipartitions.decipher_unconst_long) was then generated from the alignment using RAxML (parameters in RAxML_info.decipher_unconst_long).
Sequences were identified using Unite, Warcup, and RDP-LSU databases reannotated to use a common classification system. Scripts used to reannotate the databases are at http://github.com/brendanf/reannotate, and the reannotated databases are here as fasta.gz files formatted for use by SINTAX from USEARCH/VSEARCH or by the R package DADA2.
Usage notes
The analysis pipeline can be run on Linux (or possibly OSX, but this has not been tested) using the Snakefile included in oueme-fungi-transect-1.0.0.tar.gz. Snakemake and Anaconda/Miniconda should be installed, but all other software dependencies will be installed by the pipeline via Conda.