How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design
Naef, Thomas et al. (2022), How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design, Dryad, Dataset, https://doi.org/10.5061/dryad.79cnp5hxn
Extra-organismal DNA (eoDNA) from material left behind by organisms (non-invasive DNA: e.g., faeces, hair) or from environmental samples (eDNA: e.g., water, soil) is a valuable source of genetic information. However, the relatively low quality and quantity of eoDNA, which can be further degraded by environmental factors, results in reduced amplification and sequencing success. This is often compensated for through cost- and time-intensive replications of genotyping/sequencing procedures. Therefore, system- and site-specific quantifications of environmental degradation are needed to maximize sampling efficiency (e.g., fewer replicates, shorter sampling durations), and to improve species detection and abundance estimates. Using ten environmentally diverse bat roosts as a case study, we developed a robust modelling pipeline for quantifying the environmental factors degrading eoDNA, predict eoDNA quality, and estimate sampling-site-specific ideal exposure duration. Maximum humidity was the strongest eoDNA-degrading factor, followed by exposure duration and maximum temperature and a positive effect when hottest days occurred later. With those predictors and information on sampling period (before or after offspring were born), we reliably predicted mean eoDNA quality per sampling visit at new sites with a mean squared error of 0.0349. Site-specific simulations revealed that reducing exposure duration to 2-8 days could substantially improve eoDNA quality for future sampling. Our pipeline identified high humidity and temperature as strong drivers of eoDNA degradation even in the absence of rain and direct sunlight. Furthermore, we outline the pipeline’s utility for other systems and study goals, such as estimating sample age, improving eDNA-based species detection, and increasing the accuracy of abundance estimates.
From the corresponding publication: How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design
"We collected bat droppings at ten lesser horseshoe bat (Rhinolophus hipposideros) maternity roosts in Thuringia (Germany) between 2015-2019 (Jan et al., 2019; Lehnen et al., 2021). We sampled each roost twice a year: once in June and once in August, i.e., before and after offspring were born. We spread sheets of newspaper under the main hanging sites, and returned after 9-13 days to collect newly deposited droppings (Puechmaille & Petit, 2007). Here, we refer to such a sampling event of 9-13 days within a roost as a “roost-visit” (RV). We only retained the 25 RVs where both roost temperature (°C) and relative humidity (%, hereafter “humidity”) were recorded with iButtons (Maxim Integrated Products, 2015) logging every 30 to 180 minutes inside the roost. The microclimate of the roosts varied greatly, including hot and dry attics, cold and humid cellars, and natural semi-open caves. We excluded temperature and humidity measures from the first (newspaper deployment) and the last (sample collection) day, because exact deployment and collection times were not always recorded. Droppings were stored in airtight plastic boxes with silica-gel-beads to dry the droppings immediately upon collection (Lehnen et al., 2018). Boxes with samples were kept at room temperature for 1 to 20 days after collection, and subsequently stored at -20°C prior to extraction to achieve optimal cold and dry storage conditions (Wasser et al., 1997)."
DNA extraction and genotyping
"Following Jan et al. (2019), we randomly picked 1.1 times as many droppings as adults counted in June, and 2.1 times the number of adults in August for extraction. We co-amplified eight microsatellite loci and a sex marker from the extracts in one multiplex, and scored and genotyped them following established lab protocols (Zarzoso-Lacoste et al., 2018; Zarzoso-Lacoste et al., 2020). To minimize single-well pipetting errors, we used a multichannel pipette, working strip-wise (8 wells) on 96-well plates. For microsatellite amplification, we loaded three replicates of each sample from the 96-well extraction plates onto three different 384-well PCR plates with a pipetting robot, so that every 384-well PCR plate contained one replicate of a total of four 96-well extraction plates. All further processing was robot-assisted, thereby minimizing well-wise laboratory effects. For each sample, we used the three replicates to calculate the quality index (QI), a measure of locus-wise agreement of a consensus genotype to its single replicates (Miquel et al., 2006). QI can range from zero (indicating complete inter-replicate disagreement) to one (agreement of all three replicates with the consensus). The pipeline to build the consensus genotype and calculate the QI was described by Zarzoso-Lacoste et al. (2018) and applied by Jan et al. (2019). However, we deviated from this pipeline in two important steps: we kept all multi-locus genotypes (MLGs), irrespective of the number of loci with a consensus, to explore the full spectrum of eoDNA quality in our samples. Second, we skipped the step of manually checking (and eventually correcting) every consensus genotype that differed from others by one or two loci (Puechmaille & Petit, 2007), to keep the consensus building process uniform across samples. We removed one locus (RHC108) in 2018 because its consistently low peak height and amplification failure indicated a lab error for that marker in that year."
Filtering and statistical analyses
The data table ("quality_condition_tablemaxH100_nocoords.RData") in the folder "data" in the provided zip file contains information about what happened with droppings from spreading newspaper to collection up to genotyping and is measured on the decrease of the Quality Index (QI). The table contains collection site ("Roost"), year,("Year") sampling session (June=1 or August=2)("Sess") plate, lane and well information about extraction and PCR, logged temperature and humidity and the proximity of the logger to the samples (Droppings) as well as the exposure duration ("Days") until collection. The location on the extraction plate is also used as unique identifier for every sample (“Dropping”) analysed. All further filtering and data preparation and statistical analyses of the publication, including figures, are created with this R Script based on this table. This also allows to replace the table with own data to easily apply the pipeline to other systems.(see readme and usage notes). Note that the logger data was trimmed before adding it to the table to exclude the distribution and collection date. This was necessary because the exact time of the day of distributing newspaper under the hanging sites and the collection of the droppings was not always protocolled and temperature and humidity readings on this days could have been altered.
Jan, P.L., Lehnen, L., Besnard, A.L., Kerth, G., Biedermann, M., Schorcht, W., Petit, E.J., Le Gouar, P. & Puechmaille, S.J. (2019). Range expansion is associated with increased survival and fecundity in a long-lived bat species. Proceedings of the Royal Society B, 286, 20190384. https://doi.org/10.1098/rspb.2019.0384
Lehnen, L., Jan, P.L., Besnard, A.L., Fourcy, D., Kerth, G., Biedermann, M., Nyssen, P., Schorcht, W., Petit, E.J. & Puechmaille, S.J. (2021). Genetic diversity in a long-lived mammal is explained by the past's demographic shadow and current connectivity. Molecular Ecology, 30, 5048-5063. https://doi.org/10.1111/mec.16123
Lehnen, L., Schorcht, W., Karst, I., Biedermann, M., Kerth, G. & Puechmaille, S.J. (2018). Using approximate bayesian computation to infer sex ratios from acoustic data. PLoS One, 13, e0199428. https://doi.org/10.1371/journal.pone.0199428
Maxim Integrated Products, I. (2015 ). DS1923 iButton Hygrochron Temperature/Humidity Logger with 8KB Datalog Memory. https://datasheets.maximintegrated.com/en/ds/DS1923.pdf
Miquel, C., Bellemain, E., Poillot, C., BessiÈRe, J., Durand, A. & Taberlet, P. (2006). Quality indexes to assess the reliability of genotypes in studies using noninvasive sampling and multiple-tube approach. Molecular Ecology Notes, 6, 985-988. https://doi.org/10.1111/j.1471-8286.2006.01413.x
Puechmaille, S.J. & Petit, E.J. (2007). Empirical evaluation of non-invasive capture-mark-recapture estimation of population size based on a single sampling session. Journal of Applied Ecology, 44, 843-852. https://doi.org/10.1111/j.1365-2664.2007.01321.x
Wasser, S.K., Houston, C.S., Koehler, G.M., Cadd, G.G. & Fain, S.R. (1997). Techniques for application of faecal DNA methods to field studies of Ursids. Molecular Ecology, 6, 1091-1097. https://doi.org/10.1046/j.1365-294x.1997.00281.x
Zarzoso-Lacoste, D., Jan, P.-L.L., Lehnen, L., Girard, T., Besnard, A.-L.L., Puechmaille, S.J. & Petit, E.J. (2018). Combining noninvasive genetics and a new mammalian sex-linked marker provides new tools to investigate population size, structure and individual behaviour: an application to bats. Molecular Ecology Resources, 18, 217-228. https://doi.org/10.1111/1755-0998.12727
Zarzoso-Lacoste, D., Jan, P.-L.L., Lehnen, L., Girard, T., Besnard, A.-L.L., Puechmaille, S.J. & Petit, E.J. (2020). Corrigendum. Molecular Ecology Resources, 20, 1787-1787. https://doi.org/10.1111/1755-0998.13254
How to execute the modeling pipeline R-Script introduced in:
"How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design"
1. Before executing make sure that the following is available (code was only tested on 3 different Windows 10 64 Bit machines):
- R Studio 1.4.1717 or newer
- R 4.1 (cran.r-project.org/bin/windows/base/old/4.1.0/)
2. Make sure you have downloaded and extracted the pipeline onto the machine and location you want to execute the code from.
3. Double click on the R-project file (code_publ_4renv.Rproj) to open it with R-Studio
4. R Studio should open and show that the project is loaded top right corner of R-Studio
5. The project was created using the package renv to make sure that the right versions of packages are loaded within R.
You can check the locked package versions are loaded when clicking on "Packages" in R Studio. You should see all packages with a column "Version" and "Lockfile".
All versions should be equal to the Lockfile.
6. Make sure that the R-script is loaded. If not click on "Files" and then on "01_eoDNAQuantificationandSimulation.R"
7. Now you are ready to run the R-Script to produce all the output according to the seven pipeline steps described in the modeling pipeline based on the input table.
For more information check the comments in the R-Script itself.
Deutsche Forschungsgemeinschaft, Award: rtg2010