A national scale BioBlitz using citizen science and eDNA metabarcoding for monitoring coastal marine fish
Data files
Feb 09, 2022 version files 72.99 GB
-
B20_L4_2.fq.gz
-
B20.txt
-
B20L4_1.fq.gz
-
B21_L4_1.fq.gz
-
B21_L4_2.fq.gz
-
B21.txt
-
B22_L1_1.fq.gz
-
B22_L1_2.fq.gz
-
B22.txt
-
B23_L4_1.fq.gz
-
B23_L4_2.fq.gz
-
B23.txt
-
Pool1A1_L2_1.fq.gz
-
Pool1A1_L2_2.fq.gz
-
Pool1A1.txt
-
Pool2A2_L2_1.fq.gz
-
Pool2A2_L2_2.fq.gz
-
Pool2A2.txt
-
Pool3A3_L3_1.fq.gz
-
Pool3A3_L3_2.fq.gz
-
Pool3A3.txt
-
Pool4A4_L1_1.fq.gz
-
Pool4A4_L1_2.fq.gz
-
Pool4A4_L3_1.fq.gz
-
Pool4A4_L3_2.fq.gz
-
Pool4A4.txt
-
Pool5B1_L1_1.fq.gz
-
Pool5B1_L1_2.fq.gz
-
Pool5B1_L3_1.fq.gz
-
Pool5B1_L3_2.fq.gz
-
Pool5B1.txt
-
Pool6B2_L3_1.fq.gz
-
Pool6B2_L3_2.fq.gz
-
Pool6B2.txt
-
Pool7B3_L1_1.fq.gz
-
Pool7B3_L1_2.fq.gz
-
Pool7B3_L3_1.fq.gz
-
Pool7B3_L3_2.fq.gz
-
Pool7B3.txt
-
Pool8B4_L3_1.fq.gz
-
Pool8B4_L3_2.fq.gz
-
Pool8B4.txt
-
README.txt
-
SL1_1_L2_1.fq.gz
-
SL1_1_L2_2.fq.gz
-
SL1_1.txt
-
SL1_2_L2_1.fq.gz
-
SL1_2_L2_2.fq.gz
-
SL1_2.txt
-
SL1_3_L2_1.fq.gz
-
SL1_3_L2_2.fq.gz
-
SL1_3.txt
-
SL1_4_L2_1.fq.gz
-
SL1_4_L2_2.fq.gz
-
SL1_4.txt
Abstract
Marine biodiversity is threatened by human activities. To understand the changes happening in aquatic ecosystems and to inform management, detailed, synoptic monitoring of biodiversity across large spatial extents is needed. Such monitoring is challenging due to the time, cost, and specialized skills that this typically requires. In an unprecedented study, we here combined citizen science with eDNA metabarcoding to map coastal fish biodiversity at a national scale. We engaged 360 citizen scientists to collect filtered sea water samples from 100 sites across Denmark over two seasons (1 pm on September 29th 2019 and May 10th 2020), and by sampling at nearly the exact same time across all 100 sites, we obtained an overview of fish biodiversity largely unaffected by temporal variation. This would have been logistically impossible for the involved scientists without the help of volunteer citizens. We obtained a high return rate of 94% of the samples, and a total richness of 52 fish species, representing approximately 80% of coastal Danish fish species and approximately 25% of all Danish marine fish species. We retrieved distribution patterns matching known occurrence for both invasive, endangered, and cryptic species, and detected seasonal variation in accordance with known phenology. Dissimilarity of eDNA community compositions increased with distance between sites. Importantly, comparing our eDNA data with National Fish Atlas data (the latter compiled from a century of observations) we found positive correlation between species richness values and a congruent patterns of community compositions. These findings support the use of eDNA-based citizen science to detect patterns in biodiversity, and our approach is readily scalable to other countries, or even regional and global scales. We argue that future large-scale biomonitoring will benefit from using citizen science combined with emerging eDNA technology, and that such an approach will be important for data-driven biodiversity management and conservation.
Methods
This dataset represents environmental DNA sequencing data from samples collected by citizen scientists in autumn 2019 and spring 2020 from hundred coastal marine sites in Denmark (see connected publication for details).
DNA has been amplified with Tele02 and Elas01 primers Taberlet et al. (2018), Elas01 was not used to analyze sequencing data in this study, thus only the Tele02-F (5′-AAACTCGTGCCAGCCACC -3′) and Tele02-R (3′-GGGTATCTAATCCCAGTTTG -5′), targeting a 167 bp fragment of the 12S rRNA gene.
The libraries have been sequenced using paired end NovaSeq sequencing (150 BP PE).
The dataset consists of sixteen zipped libraries (two times four PCR replicates of samples for the two seasons) and sixteen txt files that can be used for demultiplexing purposes for each library.
Libraries are named as described below, and each consist of two sequence data files (1 and 2) (paired end sequencing).
SL1_1_L2_(1 and 2) --> Autumn 2019 samples PCR replicate one (total 44 samples, 1 CNE and 1 NTC)
SL1_2_L2_(1 and 2) --> Autumn 2019 samples PCR replicate two (total 44 samples, 1 CNE and 1 NTC)
SL1_3_L2_(1 and 2) --> Autumn 2019 samples PCR replicate three (total 44 samples, 1 CNE and 1 NTC)
SL1_4_L2_(1 and 2) --> Autumn 2019samples PCR replicate four (total 44 samples, 1 CNE and 1 NTC)
B20_L4_(1 and 2) --> Autumn 2019 samples PCR replicate one (total 52 samples, 7 CNE and 2 NTC)
B21_L4_(1 and 2) --> Autumn 2019 samples PCR replicate two (total 52 samples, 7 CNE and 2 NTC)
B22_L1_(1 and 2) --> Autumn 2019 samples PCR replicate three (total 52 samples, 7 CNE and 2 NTC)
B23_L4_(1 and 2) --> Autumn 2019 samples PCR replicate four (total 52 samples, 7 CNE and 2 NTC)
pool1A1_L2_(1 and 2) --> Spring 2020 samples PCR replicate one (total 49 samples, 6 CNE and 1 NTC)
pool2A2_L2_(1 and 2) --> Spring 2020 samples PCR replicate two (total 49 samples, 6 CNE and 1 NTC)
pool3A3_L3_(1 and 2) --> Spring 2020 samples PCR replicate three (total 49 samples, 6 CNE and 1 NTC)
pool4A4_L1_(1 and 2)* --> Spring 2020 samples PCR replicate four (total 49 samples, 6 CNE and 1 NTC)
pool5B1_L1_(1 and 2)* --> Spring 2020 samples PCR replicate one (total 49 samples, 6 CNE and 1 NTC)
pool6B2_L3_(1 and 2) --> Spring 2020 samples PCR replicate two (total 49 samples, 6 CNE and 1 NTC)
pool7B3_L1_(1 and 2)* --> Spring 2020 samples PCR replicate three (total 49 samples, 6 CNE and 1 NTC)
pool8B4_L3_(1 and 2) --> Spring 2020 samples PCR replicate four (total 49 samples, 6 CNE and 1 NTC)
Libaries with an asterix *, contains four files, these must be merge together according to their last digit "1 or 2", before demultiplexing. (eg. for the two library files "pool4A4_L1_1" and "pool4A4_L1_2", there are also two other files "pool4A4_L3_1" and "pool4A4_L3_2", contains additonal sequence data and they have to be merged together with the two library files, so "pool4A4_L1_1" and "pool4A4_L3_1" are being merged and "pool4A4_L1_2" and "pool4A4_L3_2" are being merged, this must be done before demultiplexing)
If you would like to demultiplex this data, we refer to the sixteen txt files named according to the libraries. The txt files include the samples name followed by the PCR replicate number. The two following columns represent the tags used for individual samples. Tags are consistent across PCR replicates.
The data were analyzed using the pipeline MetaBarFlow (Sigsgaard et al. 2022), which uses the python-based workflow tool gwf and parallel computing to process of metabarcoding data. The most up-to-date version of MetaBarFlow can be found at https://github.com/evaegelyng/MetaBarFlow, while the exact pipeline used for the current paper can be found at https://github.com/evaegelyng/Agersnap_et_al_2022.
If you have any questions regarding the data, dont hesitate to contact Sune Agersnap.
citations:
Sigsgaard, Eva Egelyng, Samuele Soraggi, Mads Reinholdt Jensen, Adrián Gómez Repollés, Emil Ellegaard Thomassen, and Philip Francis Thomsen. 2022. MetaBarFlow (version 0.1.0-alpha). Zenodo. https://doi.org/10.5281/zenodo.5898411.
Taberlet, Pierre, Aurélie Bonin, Lucie Zinger, and Eric Coissac. 2018. Environmental DNA: For Biodiversity Research and Monitoring. Environmental DNA. Oxford University Press.
Usage notes
If you want to follow the exact filtering and data analysis done in our study, we refer to the manuscript for further details after the demultiplex step. If you have any questions, feel free to send an email to Sune Agersnap with any questions you may have.