Skip to main content
Dryad

Data from: Pitfalls and pointers: an accessible guide to marker gene amplicon sequencing in ecological applications

Data files

Nov 18, 2021 version files 47.36 KB

Abstract

Next Generation Sequencing (NGS) is a powerful tool that has been rapidly adopted by many ecologists studying microbial communities. Despite the exciting demonstration of NGS technology as a tool for ecological research, cryptic pitfalls inherent to its use can obscure correct interpretation of NGS data. Here, we provide an accessible overview of a NGS process that uses marker gene amplicon sequences (MGAS) that will allow scientists, particularly community ecologists, to make appropriate methodological choices and understand limits on inference about community composition and diversity that can be drawn from MGAS data.

We describe the MGAS pipeline, focusing specifically on cryptic sources of variation that have received less emphasis in the ecological literature, but which may substantially impact inference about microbial community diversity and composition. By simulating communities from published microbiome data, we demonstrate how these sources of variation can generate inaccurate or misleading patterns.

We specifically highlight sample dilution without researcher awareness and lane-to-lane variability, two cryptic sources of variation arising during the MGAS pipeline. These sources of variation affect estimates of species presence and relative abundance, particularly for species with moderate to low abundances. Each of these sources of bias can lead to errors in the estimation of both absolute and relative abundance within, and turnover among, microbial communities.

Awareness and understanding of what happens and, specifically, why it happens during MGAS generation is key to generating a strong data set and building a robust community matrix. Requesting sample dilution information from the sequencing center, including technical replicates across sequencing lanes, and understanding how sampling intensity and community taxa distribution patterns shape the measurement of community richness, evenness, and diversity are critical for drawing correct ecological inferences using MGAS data.