Skip to main content
Dryad logo

Diatoms and plants sedimentary ancient DNA from Lake Satagay (Central Yakutia, Siberia) covering the last 10,800 years

Citation

Baisheva, Izabella et al. (2022), Diatoms and plants sedimentary ancient DNA from Lake Satagay (Central Yakutia, Siberia) covering the last 10,800 years, Dryad, Dataset, https://doi.org/10.5061/dryad.vq83bk3w5

Abstract

Our dataset consists of two metabarcoding datasets (1) targeting diatoms (rbcL) and (2) targeting plant taxa (trnL P6 loop) retrieved from sedimentary ancient DNA (sedaDNA) of Lake Satagay, Central Yakutia, Siberia (N 63.07816°, E 117.99806°). SedaDNA data for diatoms comprises 65 samples (sequencing run number APMG-32), whereas  61 samples were analyzed for plants (sequencing run number APMG-33). We used sedaDNA of diatoms (APMG-32) and plants (data from APMG-33 restricted to aquatic plants) to reconstruct aquatic biodiversity and lake development throughout the Holocene. SedaDNA from terrestrial plant composition was used to investigate past vegetation changes in the catchment of the lakes. We provide metadata, raw sequencing data, bioinformatic scripts (Obitools3), used reference databases, and the final data spreadsheet.

Methods

SedaDNA was extracted using Dneasy PowerSoil and Dneasy PowerSoil Max DNA Isolation Kit. Extracted DNA was combined and concentrated using a GeneJet PCR purification Kit. Primers for the amplification of diatoms targeted a diagnostic short diatom metabarcode (primers: diat_rbcL705 and diat_rbcL808, Stoof-Leichsenring et al. 2012). For plant DNA metabarcoding we used standard primers targeting the chloroplast trnL P6 loop (Taberlet et al. 2007). PCRs for diatom and plant metabarcoding were run in three replicates along with No Template Controls (NTCs) to control chemical contamination of PCR chemicals. Purification of PCRs was done using MinElute. Samples containing diatoms and plants DNA were sequenced in paired-end mode (2x 150 bp) on an Illumina NextSeq 500 platform at an external sequencing service. We used the Obitools pipeline as described in Boyer et al. 2015, but applied the updated version Obitools3 (see detailed usage description here: https://git.metabarcoding.org/obitools/obitools3). Diatom and plant EMBL reference databases were built by using in silico PCR (Ficetola et al. 2010) with diatom and plant specific primers, respectively, on the EMBL Nucleotide Sequence Database (Release 143, April 2020). Plant DNA query sequences were matched against the plant EMBL Nucleotide Sequence Database and the Arctic and Boreal vascular plant and bryophytes database (Willerslev et al. 2014, Soininen et al. 2015, Sønstebø et al. 2010). The taxonomic assignment of diatoms was based on 98–100% similarity to at least one entry of the diatom EMBL reference database and taxonomic assignment of plants was based on a 100% similarity between query sequences and the arctic and plant EMBL reference database.

Reference: 

  1. Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E (2015) OBITOOLS: a unix-inspired software package for DNA metabarcoding. Mol Ecol Res 16: 176–182. 10.1111/1755-0998.12428.
  2. Ficetola, G.F., Coissac, E., Zundel, S. Riaz, S., Shehzad, W., Bessiere, J., Taberlet, P., Pompano, F. (2010) An In silico approach for the evaluation of DNA barcodes. BMC Genomics 11, 434. https://doi.org/10.1186/1471-2164-11-434
  3. Soininen, E. M.; Gauthier, G.; Bilodeau, F.; Berteaux, D.; Gielly, L.; Taberlet, P.; Gussarova, G.; Bellemain, E.; Hassel, K.; Stenøien, H. K.; Epp, L.; Schrøder-Nielsen, A.; Brochmann, C.; Yoccoz, N. G. Highly Overlapping Winter Diet in Two Sympatric Lemming Species Revealed by DNA Metabarcoding. PLoS ONE 2015, 10 (1), e0115335. https://doi.org/10.1371/journal.pone.0115335.
  4. Sønstebø, J. H., Gielly, L., Brysting, A. K., Elven, R., Edwards, M., Haile, J., et al. (2010). Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate. Mol. Ecol. Resour. 10, 1009–1018. doi: 10.1111/j. 1755-0998.2010.02855.x
  5. Stoof-Leichsenring K, Epp L, Trauth M, Tiedemann R (2012) Hidden diversity in diatoms of Kenyan Lake Naivasha: A genetic approach detects temporal variation. Mol Ecol 21: 1918–1930. 10.1111/j.1365-294X.2011.05412.x.
  6. Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, Vermat T, Corthier G, Brochmann C, Willerslev E (2007) Power and limitations of the chloroplast TrnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res 35 (3): e14–e14. https://doi.org/10.1093/nar/gkl938.
  7. Willerslev E, Davison J, Moora M, Zobel M, Coissac E, Edwards ME, Lorenzen ED, Vestergård M, Gussarova G, Haile J, Craine J, Gielly L, Boessenkool S, Epp LS, Pearman PB, Cheddadi R, Murray D, Bråthen KA, Yoccoz N, Binney H, Cruaud C, Wincker P, Goslar T, Alsos IG, Bellemain E, Brysting AK, Elven R, Sønstebø JH, Murton J, Sher A, Rasmussen M, Rønn R, Mourier T, Cooper A, Austin J, Möller P, Froese D, Zazula G, Pompanon F, Rioux D, Niderkorn V, Tikhonov A, Savvinov G, Roberts RG, MacPhee RDE, Gilbert MTP, Kjær KH, Orlando L, Brochmann C, Taberlet P (2014) Fifty thousand years of Arctic vegetation and megafaunal diet. Nature 506 (7486): 47–51. https://doi.org/10.1038/nature12921.

Usage Notes

The datasets* are prepared for the manuscripts: Baisheva et al. (2022): "Permafrost-thaw lake development in Central Yakutia – Sedimentary ancient DNA and element analyses from a Holocene sediment record" (submitted) and Glückler et al. (2022): "Holocene wildfire and vegetation dynamics in Central Yakutia, Siberia, reconstructed from lake-sediment proxies" (preprint). Also included is the processing of the raw sequencing data using bioinformatics tools.

*The datasets were uploaded into two separate directories containing data and scripts. Each directory contains two main folders APMG_32 (Diatoms) and APMG_33 (Plants). Files of APMG_32 and APMG_33 after downloading have to be merged in the same folder, so the structure of datasets looks like as it is given below:

1)      APMG_32 contains several folders and files of different format:

00.   APMG-32_Metadata - Metadata information including lake geographic coordinates, sample depths and ages, laboratory codes and used primer tag combinations of forward and reverse primers to enable demultiplexing of the sequencing data

FILE:   APMG-32_Metadata.xlsx contains information on sequencing (run number, type, device, mode, forward and reverse tags, read length). Also it includes information on individual samples: name, type, age, depth, extraction number, and PCR number, as well as sediment core name and core section number.

Format: .xlsx

01. Raw_data_APMG-32 – Illumina sequencing raw data.

FILES: 210602_NB501850_A_L1-4_APMG-32_R1.fastq.gz

210602_NB501850_A_L1-4_APMG-32_R2.fastq.gz

Format: Illumina fastq format. The sequence files are compressed as .gz archives. Before using the data with the Obitools script (APMG_32_metabarcoding_rbcL_obi3_Dryad.sh) the datasets need to be uncompressed and converted into .fastq files.

02. Reference_data_rbcl – Database used for taxonomic assignment of diatoms.

FILES:   rbcl_embl143_db.fasta

Obi3_rbcL_database_build.sh - Script for the conversion step.

Format: .fasta and .sh. To use the rbcL database in the Obitools script (APMG_32_metabarcoding_rbcL_obi3_Dryad.sh), the rbcl_embl143_db.fasta needs to be converted to an obi3 database.

03. OBITools_APMG-32 – The metabarcoding pipeline for analyzing the raw sequencing data using OBITools3. 

FILES:   APMG_32_metabarcoding_rbcL_obi3_Dryad.sh - Script to run OBITools3 pipeline with short descriptions and output data.

APMG-32_embl143_rbcL.csv - Output file.

APMG32_tagfile.txt - File contains primer combinations for demultiplexing with Obitools3 (see script: APMG_32_metabarcoding_rbcL_obi3_Dryad.sh).

Format: .csv, .txt and .sh

04. Final_resampled_data_APMG-32:

FILES:   APMG-32_identitylevel0.98_wideformat.csv - Final count data.

APMG-32_final_resampled_scientific_name.csv - Final dataset with filtering threshold of 98%, resampled to the minimal number of counts (n=2050), including diatoms and Nannochloropsis. 

Format: .csv

    • The file APMG-32_final_resampled_data.csv was used for further statistical analyses in Baisheva et al. (2022): "Permafrost-thaw lake development in Central Yakutia – Sedimentary ancient DNA and element analyses from a Holocene sediment record" (submitted). 

2)      APMG_33 contains several folders and files of different format:

00.   APMG-33_Metadata - Contains information on sequencing (run number, type, device, mode, forward and reverse tags, read length). Also it includes information on individual samples: name, type, age, depth, extraction number, and PCR number. As well as sediment core name and core section number.

FILE:   APMG-33_Satagay2_metadata.xlsx

Format: .xlsx 

01.   Raw_data_APMG_33 – Illumina sequencing raw data.

FILES: 210602_NB501850_A_L1-4_APMG-33_R1.fastq.gz

210602_NB501850_A_L1-4_APMG-33_R2.fastq.gz

Format: Illumina fast-q format. The sequence files are compressed as .gz archives. The archives can be uncompressed on linux OS using a gzip -d command.

02. Reference_database_plants – Reference database to run OBITools pipeline with short instruction and script for the conversion step.

FILES:   arctborbryo_gh.fasta

gh_embl143_db_97.fasta

Obi3_arctborbryo_database_build.sh

Obi3_embl_database_build.sh

Format: .fasta and .sh. To use the arctborbryo embl143 database in the Obitools script (APMG-33_obi3_script.sh), .fasta files need to be converted to an obi3 database.

03. OBITools_APMG-33 – The metabarcoding pipeline for analyzing the raw sequencing data using OBITools3. 

FILES:   APMG33_arc_anno.csv - Output file.

APMG33_embl143_anno.csv - Output file.

APMG-33_obi3_script.sh

APMG-33_tagfile.txt

Format: .csv, .txt and .sh. OBITools_APMG-33 has two outputs as taxonomic assignment provided against the EMBL and Arctic databases. 

04. Final_datasets_APMG-33 - EMBL and Arctic assignments were  merged into the one dataset and filtered with 100% threshold. Final datasets separated into macrophytes and terrestrial plants. 

FILES:   APMG-33_identitylevel100_wideformat.csv - Final count data.

APMG-33_macrophytes_resampled_scientific_name.csv - Final dataset of separated macrophytes  and resampled to the minimal number of counts (n=1653). 

APMG-33_terrestrial_families.csv - Final dataset of separated terrestrial plants.

Format: .csv

  • The file “APMG-33_macrophytes_resampled_scientific_name.csv” from output data was used for further statistical analyses in Baisheva et al. (2022): "Permafrost-thaw lake development in Central Yakutia – Sedimentary ancient DNA and element analyses from a Holocene sediment record" (submitted). 
  • The file “APMG-33_terrestrial_families.csv” of separated terrestrial plants data was used for further statistical analyses in Glückler et al. (2022): "Holocene wildfire and vegetation dynamics in Central Yakutia, Siberia, reconstructed from lake-sediment proxies" (preprint).

Funding

European Research Council, Award: Glacial Legacy: 772852

Ministry of Education and Science of the Russian Federation, Award: FSRG-2020-0019

Deutscher Akademischer Austauschdienst

AWI INSPIRES (International Science Program for Integrative Research)