Accumulation of airborne, eukaryotic environmental DNA contamination: Raw sequencing data and demultiplexing info
Data files
Aug 26, 2022 version files 223.52 GB
Abstract
Environmental DNA (eDNA) metabarcoding is increasingly being implemented as a non-invasive and efficient approach for biodiversity research and monitoring across ecosystems. However, accurate detection of species with eDNA requires robust experimental designs, as eDNA analysis carries a risk of contamination at every step of the fieldwork and laboratory process. Several studies focus on rigorous laboratory procedures and processing of sequencing data, but surprisingly little research investigates the process of background input of DNA in the field. For example, airborne DNA from localities outside the study area could potentially contaminate eDNA samples. Here, we use an experimental setup and eDNA metabarcoding to study the diversity and accumulation of airborne eukaryotic eDNA on exposed surfaces in the field. At two different natural locations, a coastal marine site, and a terrestrial grassland site, we placed open containers each filled with 0.5 litres of water, which was then sampled at eight successive time points after exposure to the surroundings. We found an accumulation of detected species richness in the samples, which reached its maximum at the end of the experiment, 24 hours after exposure. This result was consistent across both sites and across two markers (COI for eukaryotes and 12S for vertebrates). While many of the detected species were contaminants commonly found in eDNA studies, we also detected several other eukaryotic taxa. Most notable were metazoan species such as birds, fish and insects, likely originating from airborne transport of eDNA. We also found that increasing the number of PCR cycles tended to have a positive impact on richness for the unfiltered reads but a negative impact on the richness after bioinformatic filtering. Our results add to the sparse evidence that metazoan eDNA can be transported by air, which has wide implications for eDNA research and calls for increased implementation of field control samples.
Methods
This dataset represents environmental DNA sequencing data from Mols Bjerge National Park (Denmark) and Aarhus (Denmark) collected on the 23rd-24th of August 2019 and the 23rd-24th of September 2019 respectively. Samples were collected at eight different timepoints over a 24-hour period, starting from 11 AM (see connected publication for additional details).
DNA has been amplified with two different primer sets, namely the Tele02 (for 12S fish data) and the Leeray XT (for CO1 eukaryote data) primers. Tele02 primers consist of the forward primer Tele02_F (5’-AAACTCGTGCCAGCCACC-3’) and the reverse primer Tele02_R (5’-GGGTATCTAATCCCAGTTTG-3’), targeting a 163-185 bp fragment of 12S. Eukaryote amplification was done with forward primer mlCOIintF-XT (5’-GGWACWRGWTGRACWNTNTAYCCYCC-3´) and reverse primer jgHCO2198 (5’-TAIACYTCIGGRTGICCRAARAAYCA-3’), targeting ca. 350 bp of CO1. The libraries have been sequenced using paired end NovaSeq 6000 sequencing (150 BP PE for 12S and 250 BP PE for CO1).
Libraries are named M1-M4 and M11-M14 for the CO1 data and M5-M10 for the vertebrate data.
Usage notes
We suggest you put the files in 14 separate folders in order to demultiplex. The files should be placed accordingly:
Folder 1: Aarhus CO1 PCR replicate one
- M1_MD5.txt (to check sums)
- M1_FKDL202559049-1a_HJ77HDRXX_L1_1.fq.gz
- M1_FKDL202559049-1a_HJ77HDRXX_L1_2.fq.gz
- M1_tags.list
Folder 2: Aarhus CO1 PCR replicate two
- M2_MD5.txt (to check sums)
- M2_FKDL202559050-1a_HJ77HDRXX_L1_1.fq.gz
- M2_FKDL202559050-1a_HJ77HDRXX_L1_2.fq.gz
- M2_tags.list
Folder 3: Aarhus Co1 PCR replicate three
- M3_MD5.txt (to check sums)
- M3_FKDL202559051-1a_HJ77HDRXX_L1_1.fq.gz
- M3_FKDL202559051-1a_HJ77HDRXX_L1_2.fq.gz
- M3_tags.list
Folder 4: Aarhus CO1 PCR replicate four
- M4_MD5.txt (to check sums)
- M4_FKDL202559052-1a_HJ77HDRXX_L1_1.fq.gz
- M4_FKDL202559052-1a_HJ77HDRXX_L1_2.fq.gz
- M4_tags.list
Folder 5: Vertebrate 40 cycles PCR replicate one
- M5_MD5.txt (to check sums)
- M5_FKDL202564481-1a_H35WWDSXY_L1_1.fq.gz
- M5_FKDL202564481-1a_H35WWDSXY_L1_2.fq.gz
- M5_tags.list
Folder 6: Vertebrate 40 cycles PCR replicate two
- M6_MD5.txt (to check sums)
- M6_FKDL202564442-1a_H35WWDSXY_L1_1.fq.gz
- M6_FKDL202564442-1a_H35WWDSXY_L1_2.fq.gz
- M6_tags.list
Folder 7: Vertebrate 45 cycles PCR replicate one
- M7_MD5.txt (to check sums)
- M7_FKDL202564482-1a_H35LKDSXY_L4_1.fq.gz
- M7_FKDL202564482-1a_H35LKDSXY_L4_2.fq.gz
- M7_tags.list
Folder 8: Vertebrate 45 cycles PCR replicate two
- M8_MD5.txt (to check sums)
- M8_FKDL202564444-1a_H35WWDSXY_L1_1.fq.gz
- M8_FKDL202564444-1a_H35WWDSXY_L1_2.fq.gz
- M8_tags.list
Folder 9: Vertebrate 50 cycles PCR replicate one
- M9_MD5.txt (to check sums)
- M9_FKDL202564445-1a_H35WWDSXY_L1_1.fq.gz
- M9_FKDL202564445-1a_H35WWDSXY_L1_2.fq.gz
- M9_tags.list
Folder 10: Vertebrate 50 cycles PCR replicate two
- M10_MD5.txt (to check sums)
- M10_FKDL202564483-1a_H35WWDSXY_L1_1.fq.gz
- M10_FKDL202564483-1a_H35WWDSXY_L1_2.fq.gz
- M10_tags.list
Folder 11: Mols CO1 PCR replicate one
- M11_MD5.txt (to check sums)
- M11_FKDL202564451-1a_HJVKFDRXX_L1_1.fq.gz
- M11_FKDL202564451-1a_HJVKFDRXX_L1_2.fq.gz
- M11_tags.list
Folder 12: Mols CO1 PCR replicate two
- M12_MD5.txt (to check sums)
- M12_FKDL202564452-1a_HJVKYDRXX_L1_1.fq.gz
- M12_FKDL202564452-1a_HJVKYDRXX_L1_2.fq.gz
- M12_tags.list
Folder 13: Mols CO1 PCR replicate three
- M13_MD5.txt (to check sums)
- M13_FKDL202564453-1a_HJVKYDRXX_L1_1.fq.gz
- M13_FKDL202564453-1a_HJVKYDRXX_L1_2.fq.gz
- M13_tags.list
Folder 14: Mols CO1 PCR replicate four
- M14_MD5.txt (to check sums)
- M14_FKDL202564454-1a_HJVKYDRXX_L1_1.fq.gz
- M14_FKDL202564454-1a_HJVKYDRXX_L1_2.fq.gz
- M14_tags.list
Following this folder structure, each folder will now contain two sequence data files (paired end sequencing), a barcode/tag file and an MD5 file for checking sums.
If you would like to demultiplex this data, all tag information needed is available in the ”list” files. Each tag file contains a number of samples, which are explained below:
P2.X.Y_Z: Pooled samples (see associated article). P2 refers to the project number. X denotes the field replicate number, Y denotes the time the sample was taken (Y=C are control samples taken in the lab) and Z refers to the sequencing library number.
P2.POS(X): Field possitive taken at the Mols site (See associated publication for explanation)
Pos_X: Positive PCR control using whaleshark DNA. One positive control was run with every PCR setup.
CNE: Extraction blanks (one was included for each round of extraction).
NTC: PCR blanks. Four PCR blanks were run in each PCR setup.
The ”list” files include the sample name followed by the PCR replicate number. The two following columns represent the tags used for each sample (both forward and reverse primer were tagged). Tags are consistent across PCR replicates.
After demultiplexing, you should be able to do as you please with the data.
If you want to follow the exact filtering and data analysis done in our study, we refer to the manuscript for further details after the demultiplex step. If you have any questions, feel free to send an email to Martin Johannesen Klepke with any questions you may have.