Plant diversity data from modern sedimentary DNA of lakes in Siberia and China
Stoof-Leichsenring, Kathleen R. et al. (2021), Plant diversity data from modern sedimentary DNA of lakes in Siberia and China, Dryad, Dataset, https://doi.org/10.5061/dryad.k6djh9w4r
Here we provide a large dataset on genetic plant diversity retrieved from surface sedimentary DNA (sedDNA) of lakes from Siberia and China spanning over a large environmental gradient. Our dataset encompasses sedDNA sequence data of 244 surface lake sediments and 3 soil samples originating from Siberia and Chinese lakes. We used a PCR-based metabarcoding approach combined with Next-Generation Sequencing to assess the modern and local plant diversity in and around the analysed lake localities. As a plant specific metabarcode we applied the established chloroplastidal P6 loop trnL marker for plant diversity assessment. PCR products were sequenced on four independent Illumina sequencing runs (ALRK-7, ALRK-3, AGAK-5 and HQD-2).
We extracted sedimentary DNA from lake surface samples by using the DNeasy PowerMax Soil Kit and PowerMax Soil DNA Isolation kit. Further, we used a PCR-based metabarcoding approach combined with Next-Generation Sequencing. As a plant specific metabarcode we applied the established chloroplastidal P6 loop trnL marker for plant diversity assessment and amplified plant DNA from sedimentary DNA extracts. Resulting PCR products were replicated for each sample, resulting in a total of 688 PCR products, which were sequenced on four independent Illumina sequencing runs (ALRK-7, ALRK-3, AGAK-5 and HQD-2). The underlying data set consists of raw R1.fastq and R2.fastq files of the four sequencing runs, two scripts that explain how to use the OBITools pipeline for data analyses and how to prepare taxonomic databases with EcoPCR and OBITools, four different tagfiles needed for demultiplexing the sequence raw data into samples, three database files for taxonomic assignment and eight final data files, two for each sequencing run.
For reanalysis of data we provide the following data files:
1. Illumina sequencing raw data of four sequencing runs (ALRK-7, ALRK-3, AGAK-5, HQD-2). Data files are compressed.
- ALRK-7 (190820_NB501473_A_L1-4_ALRK-7_R1.fastq.gz, 190820_NB501473_A_L1-4_ALRK-7_R2.fastq.gz).
- ALRK-3 (190128_NB501850_A_L1-4_ALRK-3_R1.fastq.gz, 190128_NB501850_A_L1-4_ALRK-3_R2.fastq.gz)
- AGAK-5 (180912_NB501850_A_L1-4_AGAK-5_R1.fastq.gz, 180912_NB501850_A_L1-4_AGAK-5_R2.fastq.gz)
- HQD-2 (151111_SND104_A_L008_HQD-2_R1.fastq.gz, 151111_SND104_A_L008_HQD-2_R2.fastq.gz)
2. Two scripts to run the OBITools pipeline with a short description of each step.
- Data analyses with OBITools (Script_data_analyses_with_OBITools.txt)
- Database creation with EcoPCR and OBITools (Script_Database_creation_for_OBITools.txt)
3. Tagfiles needed for the OBITools pipeline. Sample name in the tagfiles indicates the sequencing run, the sample batch number which includes samples and corresponding controls (DNA extraction blank (BLANK) and PCR negative control (NTC)).
- ALRK-7 (ALRK-7_tagfile.txt)
- ALRK-3 (ALRK-3_tagfile.txt)
- AGAK-5 (AGAK-5_tagfile.txt)
- HDQ-2 (HQD-2_tagfile.txt)
4. Taxonomic database files needed for the OBITools pipeline (see Script_Database_creation_for_OBITools.txt)
- EMBL database (g_h_embl138_final.uniqIDs.fasta)
- Arctic database (arctborbryo_gh.fasta, ecochange.zip)
5. Final data tables after bioinformatic analyses with OBITools. For each sequencing run we provide two data tables, one with the taxonomic assignment of the EMBL and a second with the taxonomic assignment of the Arctic database.
- ALRK-7 (assigned_ALRK-7_unique_clean_embl138_anno.txt, assigned_ALRK-7_unique_clean_acrtborbryo_anno.txt)
- ALRK-3 (assigned_ALRK-3_unique_clean_embl138_anno.txt, assigned_ALRK-3_unique_clean_acrtborbryo_anno.txt)
- AGAK-5 (assigned_AGAK-5_unique_clean_embl138_anno.txt, assigned_AGAK-5_unique_clean_acrtborbryo_anno.txt)
- HQD-2 (assigned_HQD-2_unique_clean_embl138_anno.txt, assigned_HQD-2_unique_clean_acrtborbryo_anno.txt)