Dataset for: How eDNA data filtration, sequence coverage, and primer selection influence assessment of fish communities in northern temperate lakes
Abstract
For nearly 15 years now, environmental DNA has demonstrated its effectiveness in monitoring biodiversity. Methodological and technical improvements have significantly enhanced the field. However, the effect of factors such as sequence coverage, bioinformatic filtration and primer choice have been less explored or need to be optimized according to specific survey objectives and study site characteristics. We evaluated these factors to help optimize monitoring fish biodiversity in North American temperate lakes. We sampled water for fish community eDNA analysis in 12 lakes from southwestern Québec, Canada. The lakes were selected to encompass a wide range of surface areas and species richness. We sampled water from a total of 520 sites (25 to 50 per lake) and analyzed three mitochondrial DNA regions (12S rRNA; 16S rRNA; and cytb) using NovaSeq sequencing. Our results, based on rarefied count matrices (from a sequencing depth of 100,000 to a minimum depth of 1,000 reads per sample), showed that keeping only species in each sample if they represented at least one thousandth (species minimum read proportion threshold = 0.001) of the sample's reads was adequate to remove false positives and had a limited negative impact on true positives with low read counts. The sequencing depth was found to have a negligible impact on the accuracy of fish community assessment in a given lake. With the same sequencing depth and a complete local reference database for each primer set, a single primer set produced similar species richness medians than the combination of two or three primer sets. Overall, 12S and 16S detected more species and provided more consistent community profiles than cytb. Based on our observations, we suggest using the 12S MiFish-U primer set and applying a minimum proportion of 0.001 reads per species and site to monitor north-temperate lentic freshwater fish communities.