Data from msGBS: A new high-throughput approach to quantify the relative species abundance in root samples of multi-species plant communities
Data files
Aug 20, 2020 version files 5.01 GB
-
ALL_FIGURES_AND_TABLES_Wagemaker_et_al_2020_MER_FINAL.xlsx
-
AM_FUNGI_NAMES.txt
-
Archaea_NAMES.txt
-
BACTERIA_NAMES.txt
-
blastN_parse_ref_msGBS.py
-
Delete_species_from_REF.py
-
Demultiplex_msGBS.py
-
Dutch_barcodes.xlsx
-
Dutch_Parse_csv.py
-
Dutch_Ref.txt
-
Dutch_Species_list.txt
-
Dutch_stats.csv
-
Dutch_Vegetation_survey.csv
-
EUKARYOTA_NAMES.txt
-
example_Dutch_barcode_file_msGBS.xlsx
-
example_Dutch_example_barcode_file_msGBS.txt
-
example_Dutch_subset_R2_Run1.fq.gz
-
example_Dutchbarcodes_msGBS_DIJKEN2019_RUN1.txt
-
example_Dutchsubset_R1_Run1.fq.gz
-
JENA_barcodes_msGBS.txt
-
JENA_Filtered_CSV.csv
-
JENA_Ref.txt
-
Make_correct_barcode_file_instructions.docx
-
Make_reference_msGBS_V3.py
-
Map_STAR_msGBS_V3.py
-
Mark_PCR_duplicates_V2.py
-
msGBS_STATS.py
-
NG_merge_adapters.txt
-
NG_merge_qual_profile.txt
-
OTHER_FUNGI_NAMES.txt
-
overview_DOCS.pptx
-
R_Figure_4AB_calibrated_non_calibrated_weighed_pool_2.R
-
R_Figure_4CD_Calibrated_and_non_calibrated_vs_weighed_results_of_Pool_2.R
-
R_Figure_5A_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL1.R
-
R_Figure_5B_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL2.R
-
R_QPCR_msGBS_weighed_statistics_pool_1.R
-
R_QPCR_msGBS_weighed_statistics_pool_2.R
-
R_slope_vs_cal.R
-
R_standaard_boxplot.R
-
RAW_DATA_Wagemaker_et_al_2020_MER_FINAL.xlsx
-
rename_fast.py
-
Virus_NAMES.txt
Sep 24, 2020 version files 5.01 GB
-
ALL_FIGURES_AND_TABLES_Wagemaker_et_al_2020_MER_FINAL.xlsx
-
AM_FUNGI_NAMES.txt
-
Archaea_NAMES.txt
-
BACTERIA_NAMES.txt
-
blastN_parse_ref_msGBS.py
-
Delete_species_from_REF.py
-
Demultiplex_msGBS.py
-
Dutch_barcodes.xlsx
-
Dutch_Parse_csv.py
-
Dutch_Ref.txt
-
Dutch_Species_list.txt
-
Dutch_stats.csv
-
Dutch_Vegetation_survey.csv
-
EUKARYOTA_NAMES.txt
-
example_Dutch_barcode_file_msGBS.xlsx
-
example_Dutch_example_barcode_file_msGBS.txt
-
example_Dutch_subset_R2_Run1.fq.gz
-
example_Dutchbarcodes_msGBS_DIJKEN2019_RUN1.txt
-
example_Dutchsubset_R1_Run1.fq.gz
-
JENA_barcodes_msGBS.txt
-
JENA_Filtered_CSV.csv
-
JENA_Ref.txt
-
Make_correct_barcode_file_instructions.docx
-
Make_reference_msGBS_V3.py
-
Map_STAR_msGBS_V3.py
-
Mark_PCR_duplicates_V2.py
-
msGBS_STATS.py
-
NG_merge_adapters.txt
-
NG_merge_qual_profile.txt
-
OTHER_FUNGI_NAMES.txt
-
overview_DOCS.pptx
-
R_Figure_4AB_calibrated_non_calibrated_weighed_pool_2.R
-
R_Figure_4CD_Calibrated_and_non_calibrated_vs_weighed_results_of_Pool_2.R
-
R_Figure_5A_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL1.R
-
R_Figure_5B_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL2.R
-
R_QPCR_msGBS_weighed_statistics_pool_1.R
-
R_QPCR_msGBS_weighed_statistics_pool_2.R
-
R_slope_vs_cal.R
-
R_standaard_boxplot.R
-
RAW_DATA_Wagemaker_et_al_2020_MER_FINAL.xlsx
-
rename_fast.py
-
Virus_NAMES.txt
Abstract
Plant interactions are as important belowground as aboveground. Belowground plant interactions are however inherently difficult to quantify, as roots of different species are difficult to disentangle. Although for a couple of decades molecular techniques have been successfully applied to quantify root abundance, root identification and quantification in multi-species plant communities remains particularly challenging.
Here we present a novel methodology, multi-species Genotyping By Sequencing (msGBS), as a next step to tackle this challenge. First, a multi-species meta-reference database containing thousands of gDNA clusters per species is created from GBS derived High Throughput Sequencing (HTS) reads. Second, GBS derived HTS reads from multi-species root samples are mapped to this meta-reference which, after a filter procedure to increase the taxonomic resolution, allows the parallel quantification of multiple species.
The msGBS signal of 111 mock-mixture root samples, with up to 8 plant species per sample, was used to calculate the within-species abundance. Optional subsequent calibration yielded the across-species abundance. The within- and across-species abundances highly correlated (R2 range 0.72-0.94 and 0.85-0.98, respectively) to the biomass-based species abundance. Compared to a qPCR based method which was previously used to analyze the same set of samples, msGBS provided similar results. Additional data on 11 congener species groups within 105 natural field root samples showed high taxonomic resolution of the method.
This dataset belongs to the article "msGBS: A new high-throughput approach to quantify the relative species abundance in root samples of multi-species plant communities". msGBS is a technique that uses Genotyping By Sequencing on mixed plant species root samples which, after a filtering step to increase the taxonomic resolution and calibration, is able to estimate plant species abundances.
The article uses data of two different experiment:
- the Jena field survay (13 plant species) and
- the Dutch field survay (120 plant species).
Methods
The data is based on 3 lanes of High Throuput sequencing (Hiseq X PE 2*150bp) data. The Lab protocol is based on Genotyping By Sequencing (Elshire, 2011) but is adjusted in design as described in the manuscript and applied to mixed species samples instead of species pure material. Downstream analysis is not aimed to identify SNP's as in the original method but to quantify the relative abundance of the species present the the plant root mixture.
The data was processed by the msGBS pipeline (available at: GitHub (https://github.com/NielsWagemaker/scripts_msGBS/tree/msGBS-1.0).
A new, easy to install, snakemake pipelin is also available but not descripbed in the article (https://github.com/NielsWagemaker/scripts_msGBS/tree/msGBS-snake).
Usage notes
Included in this Dryad dataset are a Barcode file for demultiplexing the Raw sequence data available at NCBI Sequence read Archive (SRA; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA604964), the metareference genome and a raw .csv data analysis output file. Also included are several -in-between- analysis files as described in the article supplemental data.
In the Dryad overview file (overview_DOCS.pptx), the available files are annotated to the several bioinformatical processes of the pipeline in order to navigate through the data files.
Extended descriptions on bioinformatics are available in the supplemental data of the article.
Raw data of Figures and Tables of the Manuscript are available (RAW_DATA_Wagemaker_et_al_MER_2020.xlsx) as well as metareference genomes.