Data from: Potential and pitfalls of eukaryotic metagenome skimming: A test case for lichens

Greshake, Bastian, Goethe University Frankfurt

Zehr, Simonida, Goethe University Frankfurt

Dal Grande, Francesco, Goethe University Frankfurt

Meiser, Anjuli, Goethe University Frankfurt

Schmitt, Imke, Goethe University Frankfurt

Ebersberger, Ingo, Goethe University Frankfurt

Published Sep 11, 2015 on Dryad. https://doi.org/10.5061/dryad.8h95q

Cite this dataset

Greshake, Bastian et al. (2015). Data from: Potential and pitfalls of eukaryotic metagenome skimming: A test case for lichens [Dataset]. Dryad. https://doi.org/10.5061/dryad.8h95q

Abstract

Whole genome shotgun sequencing of multi species communities using only a single library layout is commonly used to assess taxonomic and functional diversity of microbial assemblages. Here we investigate to what extent such metagenome skimming approaches are applicable for in-depth genomic characterizations of eukaryotic communities, e.g. lichens. We address how to best assemble a particular eukaryotic metagenome skimming data, what pitfalls can occur, and what genome quality can be expected from this data. To facilitate a project specific benchmarking, we introduce the concept of twin sets, simulated data resembling the outcome of a particular metagenome sequencing study. We show that the quality of genome reconstructions depends essentially on assembler choice. Individual tools, including the metagenome assemblers Omega and MetaVelvet, are surprisingly sensitive to low and uneven coverages. In combination with the routine of assembly parameter choice to optimize the assembly N50 size, these tools can preclude an entire genome from the assembly. In contrast, MIRA, an all-purpose overlap assembler, and SPAdes, a multi-sized de Bruijn graph assembler, facilitate a comprehensive view on the individual genomes across a wide range of coverage ratios. Testing assemblers on a real-world metagenome skimming data from the lichen Lasallia pustulata demonstrates the applicability of twin sets for guiding method selection. Furthermore, it reveals that the assembly outcome for the photobiont Trebouxia sp. falls behind the a-priori expectation given the simulations. Although the underlying reasons remain still unclear this highlights that further studies on this organism require special attention during sequence data generation and downstream analysis.

Usage notes

AssemblyResults

The assemblies generated by the 6 different assemblers for the 11 data sets.

AsterochlorisPseudogenome

The pseudogenome used for simulating the reads.

CladoniaPseudogenome

The pseudogenome used for simulating the Cladonia reads.

lasallia_assemblies

The 6 different assemblies generated from the real Lasallia pustulata data set.