Data from: Sampling strategy matters: eDNA-based assessment and extrapolation of myxozoan diversity in a model stream system
Data files
Aug 28, 2025 version files 695.35 KB
-
Full_phylogenetic_tree.pdf
638.65 KB
-
Lisnerova_et__al_myxozoan_OTUs_sequences.fasta
8.92 KB
-
Lisnerova_et_al_Myxozoan_OTUs_table.xlsx
16.77 KB
-
R-code_Lisnerova_et_al_OIKOS_2.zip
26.19 KB
-
README.md
4.82 KB
Abstract
Determining true species diversity is particularly challenging for parasitic organisms. Environmental DNA (eDNA) metabarcoding has revolutionized diversity studies by increasing both efficiency and precision. This approach is especially relevant for parasitic species, like myxozoans that spend part of their life cycle outside their hosts, often during hostswitching stages. However, the adequacy of sampling depth and methodology in capturing total diversity often remains uncertain. To address this, we used a small stream as a model system and collected 60 small-volume aquatic sediment samples. Using an eDNA metabarcoding approach with myxozoan-specific primers, including newly designed primers for Malacosporea and the Paramyxidium group, we identified 30 myxozoan OTUs. These were predominantly from the Myxobolus clade but represented a broad spectrum of freshwater myxozoans, including Malacosporea. Rarefaction and extrapolation analyses estimated the potential maximum myxozoan diversity at 40 OTUs, 25% higher than the observed diversity. Although approximately a quarter of OTUs were probably missed, our analysis suggested that additional sampling would not be efficient, as most newly detected OTUs would overlap with those already identified or belong to very rare species. We demonstrated that eDNA-based methods reliably detect myxozoan diversity in small sediment volumes and that statistical approaches can effectively estimate true diversity and assess the need for further sampling at a given locality.
Dataset DOI: 10.5061/dryad.ffbg79d4z
Description of the data and file structure
In this repository, we provide the data and necessary information for our paper titled “Sampling strategy matters: eDNA-based assessment and extrapolation of myxozoan diversity in a model stream system”. We include resulting myxozoan OTUs, OTU table and R-code we used to produce all analyses in this paper.
Files and variables
File: R-code_Lisnerova_et_al_OIKOS_2.zip
Description: This R script reproduces the statistical analyses and figures presented in the manuscript:
Lisnerová, M. et al. (2025) published in Oikos.
Files
-
R-code_Lisnerova_et_al_OIKOS.R– main script with all analyses and figure generation and details -
community_presence.csv:The table shows the presence/absence matrix of species across samples. Each row corresponds to a species, while each column (H1–H60) represents a sample. A value of “1” indicates that the species was detected in the given sample, whereas “0” indicates absence. -
species_list.csv: The table lists the species detected in each sample. Rows correspond to individual samples (H1–H60), while columns (C1–C14) indicate the species recorded in each sample. The column “Order” shows the numerical order of samples. Empty cells indicate that no additional species were detected for that sample.
File: Lisnerova_et__al_myxozoan_OTUs_sequences.fasta
Description: Resulting sequences of OTUs assigned to myxozoans used in the present study.
File: Lisnerova_et_al_Myxozoan_OTUs_table.xlsx
Description: Resulting OTU table used for analyses with read accounts for individual OTUs per individual samples. Green text indicates positive samples, i.e., those with more than 20 reads, while red text with 0 indicates samples with fewer than 20 reads.
File: Full_phylogenetic_tree.pdf
Description: Output ML full tree with bootstraps
Code/software
Bioinformatic analyses: We merged forward and reverse reads using FastqJoin (Aronesty 2011), control the quality of reads using FastX Toolkit (Gordon and Hannon 2010), demultiplex the data with specific Python script with subsequent barcode trimming, and operational taxonomic units (OTUs) were clustered to generate sets of unique sequences at an OTU radius of 3% and the OTU table was constructed using USEARCH (Edgar 2010). Only OTUs containing more than 20 reads were included in the analysis to avoid false-positive detections.
Statistical analyses: Statistical analyses were calculated using R software (R Development Core Team 2021). To investigate the relationship between the number of obtained OTUs and sample size, the number of OTUs was modelled as a function of number of samples collected. Out of the total pool of samples (60), always one sample was randomly drawn (with replacement) from the sample pool and number of newly found OTUs were recorded. This was repeated 60 times to match the actual number of collected samples resulting with increasing (cumulative) number of OTUs as a response to increasing sample number. This was repeated 1,000 times to get mean value of number of OTUs per corresponding sample size. The means were plotted with 90% prediction intervals, which gives the probability range for alternative observations.
To estimate (using extrapolation) a potential maximum number of OTUs of myxozoans and completeness of the samples (the relative proportion of the total number of species estimated for full community belonging to field-sampled OTUs) in the stream based on their incidence data, we used the Hill numbers (Hill 1973) implemented in the R package iNEXT (Hsieh et al. 2016). This approach quantifies biodiversity by combining information on the number of species (or OTUs) and their relative frequencies – in our case, based on incidence (presence/absence across samples) rather than abundance. It allows us to estimate diversity not only for the observed sample size (interpolation), but also to predict expected diversity for bigger sampling effort, i.e. higher sampling completeness (extrapolation). We have used diversity order q=0 corresponding to OTUs richness and Chao2 estimator for incidence data. It is important to note that this estimation can be sensitive to very rare OTUs occurrences, however this uncertainty is reflected in the provided confidence intervals. The potential diversity and completeness of the samples was simulated for 300 samples, where the model became asymptotic.
Access information
Other publicly accessible locations of the data:
- n/a
Data was derived from the following sources:
- n/a
