How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design
Data files
Mar 29, 2023 version files 9.70 MB
-
ADO-FA_ADO_FA_Mod_final_4eoDNA_All_Results_metadata.txt
4.42 KB
-
ADO-FA_ADO_FA_Mod_final_4eoDNA_All_Results.csv
6.89 MB
-
Cons_4eoDNA_MM2_6_s7_corrected_metadata.txt
2.68 KB
-
Cons_4eoDNA_MM2_6_s7_corrected.csv
260.68 KB
-
Model_building_data.csv
1.64 MB
-
Model_building_dataANDTest_data_metadata.txt
1.74 KB
-
quality_condition_tablemaxH100_nocoords_metadata.txt
5.86 KB
-
quality_condition_tablemaxH100_nocoords.RData
688.32 KB
-
README.txt
3.08 KB
-
Test_data.csv
212 KB
Abstract
Extra-organismal DNA (eoDNA) from material left behind by organisms (non-invasive DNA: e.g., faeces, hair) or from environmental samples (eDNA: e.g., water, soil) is a valuable source of genetic information. However, the relatively low quality and quantity of eoDNA, which can be further degraded by environmental factors, results in reduced amplification and sequencing success. This is often compensated for through cost- and time-intensive replications of genotyping/sequencing procedures. Therefore, system- and site-specific quantifications of environmental degradation are needed to maximize sampling efficiency (e.g., fewer replicates, shorter sampling durations), and to improve species detection and abundance estimates. Using ten environmentally diverse bat roosts as a case study, we developed a robust modelling pipeline to quantify the environmental factors degrading eoDNA, predict eoDNA quality, and estimate sampling-site-specific ideal exposure duration. Maximum humidity was the strongest eoDNA-degrading factor, followed by exposure duration and then maximum temperature. We also found a positive effect when hottest days occurred later. The strength of this effect fell between the strength of the effects of exposure duration and maximum temperature. With those predictors and information on sampling period (before or after offspring were born), we reliably predicted mean eoDNA quality per sampling visit at new sites with a mean squared error of 0.0349. Site-specific simulations revealed that reducing exposure duration to 2-8 days could substantially improve eoDNA quality for future sampling. Our pipeline identified high humidity and temperature as strong drivers of eoDNA degradation even in the absence of rain and direct sunlight. Furthermore, we outline the pipeline’s utility for other systems and study goals, such as estimating sample age, improving eDNA-based species detection, and increasing the accuracy of abundance estimates.
Methods
From the corresponding publication: How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design
Sampling
We collected bat droppings at ten lesser horseshoe bat (Rhinolophus hipposideros) maternity roosts in Thuringia (Germany) between 2015-2019 (Jan et al., 2019; Lehnen et al., 2021). We sampled each roost twice a year: once in June and once in August, i.e., before and after offspring were born. We spread sheets of newspaper under the main hanging sites, and returned after 9-13 days to collect newly deposited droppings (Puechmaille & Petit, 2007). Here, we refer to such a sampling event of 9-13 days within a roost as a “roost-visit” (RV). We only retained the 25 RVs where both roost temperature (°C) and relative humidity (%, hereafter “humidity”) were recorded with iButtons (Maxim Integrated Products, 2015 ) logging every 30 to 180 minutes inside the roost. The microclimate of the roosts varied greatly, including hot and dry attics, cold and humid cellars, and natural semi-open caves. We excluded temperature and humidity measures from the first (newspaper deployment) and the last (sample collection) day, because exact deployment and collection times were not always recorded. We stored all droppings of each RV separately in airtight plastic boxes with silica-gel-beads to dry the droppings immediately upon collection (Lehnen et al., 2018). Boxes with samples were kept at room temperature for 1 to 20 days after collection, and subsequently stored at -20°C prior to extraction to achieve optimal cold and dry storage conditions (Wasser et al., 1997).
DNA extraction and genotyping
Following Jan et al. (2019), for the June and August sampling, respectively, we randomly picked 1.1 and 2.1 times as many droppings from the plastic boxes as adults counted in June. To reduce potential contamination, we picked droppings and extracted their DNA at two designated benches in a pre-PCR laboratory (i.e. no PCR products are allowed in the lab) dedicated to Rhinolophus hipposideros non-invasive genetics. We amplified eight microsatellite loci and a sex marker from the extracts in one multiplex, and scored and genotyped them following established lab protocols (Zarzoso-Lacoste et al., 2018; Zarzoso-Lacoste et al., 2020). To minimize single-well pipetting errors, we used a multichannel pipette, working strip-wise (8 wells) on 96-well plates. For microsatellite amplification, we loaded three replicates of each sample from the 96-well extraction plates onto three different 384-well PCR plates with a pipetting robot, so that every 384-well PCR plate contained one replicate of a total of four 96-well extraction plates. All further processing was robot-assisted, thereby minimizing well-wise laboratory effects.
We used a multi-tube approach with three replicates to form reliable consensus genotypes. This replicate number is based on Puechmaille et al. (2007) where an average of 2.73 replicates (2.54-3.09, depending on colony) resulted in reliable consensus genotypes for the same species and collection protocol. To be included in a consensus genotype, an allele at a locus had to be present at least twice across the replicates (i.e., two or three out of three). We automated this process using the bioinformatics pipeline described in Zarzoso-Lacoste et al. (2018) and applied by Jan et al. (2019). This pipeline recovers weaker peaks within a replicate due to hierarchical fall back from more stringent peak detection thresholds, while reducing scoring alleles from cross-contamination by combining peak height thresholds, peak height ratios and a multi-tube approach (Mäck et al., 2021). If more than two alleles were detected at a locus in a replicate, only the highest two were kept, while lower ones, potentially introduced by cross-contamination, were discarded. To avoid scoring a true homozygote locus as a heterozygote locus, second-highest alleles were only accepted at a locus of a replicate if they exceeded a certain height peak ratio compared to the higher peak (see further details in Zarzoso-Lacoste et al., 2018; Jan et al. 2019).
To facilitate comparison to other eoDNA degradation studies, we calculated the commonly used PCR success rate per sample, defined as the proportion of all loci across all replicates that resulted in a scorable peak. However, to measure actual eoDNA degradation, we used the three replicates and the consensus to calculated the more informative quality index (QI) per sample, a measure of locus-wise agreement of single replicates to their consensus genotype (Miquel et al., 2006). QI can range from zero (indicating amplification failure and/or complete inter-replicate disagreement) to one (amplification and agreement of all three replicates with the consensus). If no consensus could be built due to complete failure of amplification in all replicates of that locus, or due to the inability to form an allele-wise majority, the consensus at that locus was scored as NA and the QI as 0.
In the calculation of QI, we deviated from the pipeline of Zarzoso-Lacoste et al. (2018) in two important steps. First, we kept all multi-locus genotypes (MLGs), irrespective of the number of loci with a consensus, to explore the full spectrum of eoDNA degradation in our samples. Second, we skipped the step of manually checking (and eventually correcting) every consensus genotype that differed from others by one or two loci (Puechmaille & Petit, 2007), because we were only interested in measuring eoDNA degradation. Lower QI prior to the manual correction due to disagreement among loci between replicates or inability to form a consensus at a locus can inform about potential degradation of eoDNA (see Supp_01), and manual correction would likely weaken such signal. Compared to a manually corrected dataset, only 8.83% of usable samples (> seven consensus loci) in this study would be corrected with a majority (91.7%) being altered at only one consensus locus leading to a theoretical maximum QI deviation of 0.11 - 0.13 in the few samples distributed over the RVs (Supp_02). We also removed one locus (RHC108) in 2018 because its consistently low peak height and amplification failure indicated a lab error for that marker in that year (Supp_03).
Filtering and statistical analyses as additional description not mentioned in the publication
The data table ("quality_condition_tablemaxH100_nocoords.RData") in the folder "data" in the provided zip file contains information about what happened with droppings from spreading newspaper to collection up to genotyping and is measured on the decrease of the Quality Index (QI). The table contains collection site ("Roost"), year,("Year") sampling session (June=1 or August=2)("Sess") plate, lane and well information about extraction and PCR, logged temperature and humidity and the proximity of the logger to the samples (Droppings) as well as the exposure duration ("Days") until collection. The location on the extraction plate is also used as unique identifier for every sample (“Dropping”) analysed. All further filtering and data preparation and statistical analyses of the publication, including figures, are created with this R Script based on this table. This also allows to replace the table with own data to easily apply the pipeline to other systems. Note that the logger data was trimmed before adding it to the table to exclude the distribution and collection date. This was necessary because the exact time of the day of distributing newspaper under the hanging sites and the collection of the droppings was not always protocolled and temperature and humidity readings on these days could have been altered. For more details about the data check the readme file.
References:
Jan, P.L., Lehnen, L., Besnard, A.L., Kerth, G., Biedermann, M., Schorcht, W., Petit, E.J., Le Gouar, P. & Puechmaille, S.J. (2019). Range expansion is associated with increased survival and fecundity in a long-lived bat species. Proceedings of the Royal Society B, 286, 20190384. https://doi.org/10.1098/rspb.2019.0384
Lehnen, L., Jan, P.L., Besnard, A.L., Fourcy, D., Kerth, G., Biedermann, M., Nyssen, P., Schorcht, W., Petit, E.J. & Puechmaille, S.J. (2021). Genetic diversity in a long-lived mammal is explained by the past's demographic shadow and current connectivity. Molecular Ecology, 30, 5048-5063. https://doi.org/10.1111/mec.16123
Lehnen, L., Schorcht, W., Karst, I., Biedermann, M., Kerth, G. & Puechmaille, S.J. (2018). Using approximate bayesian computation to infer sex ratios from acoustic data. PLoS One, 13, e0199428. https://doi.org/10.1371/journal.pone.0199428
Maxim Integrated Products, I. (2015 ). DS1923 iButton Hygrochron Temperature/Humidity Logger with 8KB Datalog Memory. https://datasheets.maximintegrated.com/en/ds/DS1923.pdf
Miquel, C., Bellemain, E., Poillot, C., BessiÈRe, J., Durand, A. & Taberlet, P. (2006). Quality indexes to assess the reliability of genotypes in studies using noninvasive sampling and multiple-tube approach. Molecular Ecology Notes, 6, 985-988. https://doi.org/10.1111/j.1471-8286.2006.01413.x
Puechmaille, S.J. & Petit, E.J. (2007). Empirical evaluation of non-invasive capture-mark-recapture estimation of population size based on a single sampling session. Journal of Applied Ecology, 44, 843-852. https://doi.org/10.1111/j.1365-2664.2007.01321.x
Wasser, S.K., Houston, C.S., Koehler, G.M., Cadd, G.G. & Fain, S.R. (1997). Techniques for application of faecal DNA methods to field studies of Ursids. Molecular Ecology, 6, 1091-1097. https://doi.org/10.1046/j.1365-294x.1997.00281.x
Zarzoso-Lacoste, D., Jan, P.-L.L., Lehnen, L., Girard, T., Besnard, A.-L.L., Puechmaille, S.J. & Petit, E.J. (2018). Combining noninvasive genetics and a new mammalian sex-linked marker provides new tools to investigate population size, structure and individual behaviour: an application to bats. Molecular Ecology Resources, 18, 217-228. https://doi.org/10.1111/1755-0998.12727
Zarzoso-Lacoste, D., Jan, P.-L.L., Lehnen, L., Girard, T., Besnard, A.-L.L., Puechmaille, S.J. & Petit, E.J. (2020). Corrigendum. Molecular Ecology Resources, 20, 1787-1787. https://doi.org/10.1111/1755-0998.13254
Usage notes
How to execute the modeling pipeline R-Script introduced in: "How to quantify factors degrading DNA in the environment and predict degradation for effective sampling design"
1. Before executing make sure that the following is available (code was only tested on Windows 10 64 Bit machine and Ubuntu 20.04, see also R_sessionInfo....txt files):
- R Studio 1.4.1717
- R 4.1 (cran.r-project.org/bin/windows/base/old/4.1.0/)
2. Make sure you have downloaded and extracted the pipeline onto the machine and location you want to execute the code from.
3. Double-click on the R-project file (eoDNAQuantificationandSimulation.Rproj) to open it with R-Studio
4. R Studio should open and show that the project is loaded top right corner of R-Studio
5. The project was created using the package renv to make sure that the right versions of packages are loaded within R.
You can check the locked package versions are loaded when clicking on "Packages" in R Studio. You should see all packages with a column "Version" and "Lockfile".
All versions should be equal to the Lockfile.
6. Make sure that the R-script is loaded. If not click on "Files" and then on "01_eoDNAQuantificationandSimulation.R"
7. Now you are ready to run the R-Script to produce all the output according to the seven pipeline steps described in the modeling pipeline based on the input table.
8. The additional Script "02_PCRSuccessrate.R" demonstrates that QI is superior to PCR Success rate in measuring eoDNA degradtion and shows how the two relate to each other
9. The additional script "03_MismatchEffect.R" demonstrates the influence of manual mismatch correction on QI and justifies skipping manual mismatch correction if one is only interested in capturing eoDNA quality/degradation.
--> For more information check the comments in the R-Script itself.
--> For application to other systems use script "01_eoDNAQuantificationandSimulation.R" as a backbone and adjust it to your needs
--> The 18 custom functions will help to perform the 7 steps of the modeling pipeline but also help to identify and potentially remove strong labeffects.