Estimation of species abundance based on the number of segregating sites using environmental DNA (eDNA)
Data files
Apr 24, 2024 version files 3.38 MB
-
README.md
-
Simulated_seq__mutation_rate-10-6.fa
-
Simulated_seq__mutation_rate-10-7.fa
-
Simulated_seq__mutation_rate-10-8.fa
Abstract
The advancement of environmental DNA (eDNA) has enabled rapid and non-invasive species detection in aquatic environments. While most studies focus on detecting species presence or absence, recent research has explored using eDNA data to quantify species abundance. This estimation usually is based on the concentration of targeted eDNA. However, eDNA concentration can be influenced by various factors, both biotic and abiotic, which can obscure the relationship between concentration and species abundance. In this study, we suggest using the number of segregating sites as a proxy for estimating species abundance. We investigated this relationship in silico, in vitro, and in situ (mesocosm experiments) using two brackish goby species, Acanthogobius hasta and Tridentiger bifasciatus. Analysis of simulated and in vitro data, where DNA was mixed from a known number of individuals, revealed a strong correlation between the number of segregating sites and species abundance (R2 > 0.9; P < 0.01). Results from the mesocosm experiment confirmed this correlation (R2 = 0.70, P < 0.01). This correlation remained consistent despite biotic factors such as body size and feeding behavior of the fish (P > 0.05). Cross-validation tests demonstrated that the number of segregating sites predicts species abundance more accurately and reliably than eDNA concentration. In conclusion, the number of segregating sites is a precise and robust indicator of species abundance compared to eDNA concentration, offering a significant enhancement to the quantitative capabilities of eDNA technology.
README: Estimation of Species Abundance Based on the Number of Segregating Sites using Environmental DNA (eDNA)
https://doi.org/10.5061/dryad.w3r2280zz
This is a set of simulated sequences to explore the relationship between number of segregation site and species abundance.
Description of the data and file structure
We generated the sequence at three mutation rate: 10-6/bp/gen, 10-7/bp/gen and 10-8/bp/gen, correponding to Simulated_seq__mutation_rate-10-6.fa
, Simulated_seq__mutation_rate-10-7.fa
, Simulated_seq__mutation_rate-10-8.fa
. We generated 1,000 sequences with the sequence length of 17,000 bp for each mutation rate. No insertion or deletion was simulated in the simulated sequences. Only loci with mutations are included.
Code/Software
All data were generated using the software Fastsimcoal2 (Excoffier & Foll, 2011).
Methods
We first assessed the relationship between the number of segregating sites and species abundance by entirely simulated sequences. The length of simulated sequences was set at 17,000 bp, close to the total size of 11 target segments. The number of simulated sequences/individuals was 1000, and sequences were generated at the mutation rate of 10-6 /bp/gen. To account for mutation rate variation among different species, we also generated another two datasets at the mutation rate of 10-7 /bp/gen and 10-8 /bp/gen. All data were generated using the software Fastsimcoal2 (Excoffier & Foll, 2011). A subset of sequences were randomly chosen from the simulated data, ranging from 20 to 980 sequences with intervals of 20. Selected sequences were aligned using MUSCLE v1.0 (Edgar, 2004), then the number of segregating sites was counted from alignments. The simulation process was repeated three times at each specified number of sequences. The correlation between the number of segregating sites and the number of individuals/sequences was estimated by regression analysis using Microsoft Excel, in which the number of segregating sites was the dependent variable (y), the number of individuals/sequences was the independent variable (x) and the significance of the correlation was estimated by R2, and P values.