Data from msGBS: A new high-throughput approach to quantify the relative species abundance in root samples of multi-species plant communities
Data files
Aug 20, 2020 version files 5.01 GB
-
ALL_FIGURES_AND_TABLES_Wagemaker_et_al_2020_MER_FINAL.xlsx
9.96 MB
-
AM_FUNGI_NAMES.txt
237 B
-
Archaea_NAMES.txt
68 B
-
BACTERIA_NAMES.txt
29.49 KB
-
blastN_parse_ref_msGBS.py
18.87 KB
-
Delete_species_from_REF.py
2.46 KB
-
Demultiplex_msGBS.py
23.58 KB
-
Dutch_barcodes.xlsx
28 KB
-
Dutch_Parse_csv.py
27.14 KB
-
Dutch_Ref.txt
1.51 GB
-
Dutch_Species_list.txt
3.91 KB
-
Dutch_stats.csv
2.33 GB
-
Dutch_Vegetation_survey.csv
32.72 KB
-
EUKARYOTA_NAMES.txt
44 KB
-
example_Dutch_barcode_file_msGBS.xlsx
21.80 KB
-
example_Dutch_example_barcode_file_msGBS.txt
13.49 KB
-
example_Dutch_subset_R2_Run1.fq.gz
226.97 MB
-
example_Dutchbarcodes_msGBS_DIJKEN2019_RUN1.txt
11.23 KB
-
example_Dutchsubset_R1_Run1.fq.gz
216.14 MB
-
JENA_barcodes_msGBS.txt
22.19 KB
-
JENA_Filtered_CSV.csv
378.02 MB
-
JENA_Ref.txt
344.44 MB
-
Make_correct_barcode_file_instructions.docx
13.22 KB
-
Make_reference_msGBS_V3.py
19.72 KB
-
Map_STAR_msGBS_V3.py
17.69 KB
-
Mark_PCR_duplicates_V2.py
9.67 KB
-
msGBS_STATS.py
3.43 KB
-
NG_merge_adapters.txt
120 B
-
NG_merge_qual_profile.txt
13.88 KB
-
OTHER_FUNGI_NAMES.txt
4.53 KB
-
overview_DOCS.pptx
466.36 KB
-
R_Figure_4AB_calibrated_non_calibrated_weighed_pool_2.R
3.24 KB
-
R_Figure_4CD_Calibrated_and_non_calibrated_vs_weighed_results_of_Pool_2.R
3.25 KB
-
R_Figure_5A_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL1.R
6.93 KB
-
R_Figure_5B_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL2.R
6.80 KB
-
R_QPCR_msGBS_weighed_statistics_pool_1.R
2.19 KB
-
R_QPCR_msGBS_weighed_statistics_pool_2.R
2.59 KB
-
R_slope_vs_cal.R
3.92 KB
-
R_standaard_boxplot.R
1.07 KB
-
RAW_DATA_Wagemaker_et_al_2020_MER_FINAL.xlsx
727.58 KB
-
rename_fast.py
4.99 KB
-
Virus_NAMES.txt
337 B
Sep 24, 2020 version files 5.01 GB
-
ALL_FIGURES_AND_TABLES_Wagemaker_et_al_2020_MER_FINAL.xlsx
9.96 MB
-
AM_FUNGI_NAMES.txt
237 B
-
Archaea_NAMES.txt
68 B
-
BACTERIA_NAMES.txt
29.49 KB
-
blastN_parse_ref_msGBS.py
18.87 KB
-
Delete_species_from_REF.py
2.46 KB
-
Demultiplex_msGBS.py
23.58 KB
-
Dutch_barcodes.xlsx
28 KB
-
Dutch_Parse_csv.py
27.14 KB
-
Dutch_Ref.txt
1.51 GB
-
Dutch_Species_list.txt
3.91 KB
-
Dutch_stats.csv
2.33 GB
-
Dutch_Vegetation_survey.csv
32.72 KB
-
EUKARYOTA_NAMES.txt
44 KB
-
example_Dutch_barcode_file_msGBS.xlsx
21.80 KB
-
example_Dutch_example_barcode_file_msGBS.txt
13.49 KB
-
example_Dutch_subset_R2_Run1.fq.gz
226.97 MB
-
example_Dutchbarcodes_msGBS_DIJKEN2019_RUN1.txt
11.23 KB
-
example_Dutchsubset_R1_Run1.fq.gz
216.14 MB
-
JENA_barcodes_msGBS.txt
22.19 KB
-
JENA_Filtered_CSV.csv
378.02 MB
-
JENA_Ref.txt
344.44 MB
-
Make_correct_barcode_file_instructions.docx
13.22 KB
-
Make_reference_msGBS_V3.py
19.72 KB
-
Map_STAR_msGBS_V3.py
17.69 KB
-
Mark_PCR_duplicates_V2.py
9.67 KB
-
msGBS_STATS.py
3.43 KB
-
NG_merge_adapters.txt
120 B
-
NG_merge_qual_profile.txt
13.88 KB
-
OTHER_FUNGI_NAMES.txt
4.53 KB
-
overview_DOCS.pptx
466.36 KB
-
R_Figure_4AB_calibrated_non_calibrated_weighed_pool_2.R
3.24 KB
-
R_Figure_4CD_Calibrated_and_non_calibrated_vs_weighed_results_of_Pool_2.R
3.25 KB
-
R_Figure_5A_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL1.R
6.93 KB
-
R_Figure_5B_All_species_Weighed_vs_QPCR_vs_msGBS_MRT2019_POOL2.R
6.80 KB
-
R_QPCR_msGBS_weighed_statistics_pool_1.R
2.19 KB
-
R_QPCR_msGBS_weighed_statistics_pool_2.R
2.59 KB
-
R_slope_vs_cal.R
3.92 KB
-
R_standaard_boxplot.R
1.07 KB
-
RAW_DATA_Wagemaker_et_al_2020_MER_FINAL.xlsx
727.58 KB
-
rename_fast.py
4.99 KB
-
Virus_NAMES.txt
337 B
Abstract
Plant interactions are as important belowground as aboveground. Belowground plant interactions are however inherently difficult to quantify, as roots of different species are difficult to disentangle. Although for a couple of decades molecular techniques have been successfully applied to quantify root abundance, root identification and quantification in multi-species plant communities remains particularly challenging.
Here we present a novel methodology, multi-species Genotyping By Sequencing (msGBS), as a next step to tackle this challenge. First, a multi-species meta-reference database containing thousands of gDNA clusters per species is created from GBS derived High Throughput Sequencing (HTS) reads. Second, GBS derived HTS reads from multi-species root samples are mapped to this meta-reference which, after a filter procedure to increase the taxonomic resolution, allows the parallel quantification of multiple species.
The msGBS signal of 111 mock-mixture root samples, with up to 8 plant species per sample, was used to calculate the within-species abundance. Optional subsequent calibration yielded the across-species abundance. The within- and across-species abundances highly correlated (R2 range 0.72-0.94 and 0.85-0.98, respectively) to the biomass-based species abundance. Compared to a qPCR based method which was previously used to analyze the same set of samples, msGBS provided similar results. Additional data on 11 congener species groups within 105 natural field root samples showed high taxonomic resolution of the method.
This dataset belongs to the article "msGBS: A new high-throughput approach to quantify the relative species abundance in root samples of multi-species plant communities". msGBS is a technique that uses Genotyping By Sequencing on mixed plant species root samples which, after a filtering step to increase the taxonomic resolution and calibration, is able to estimate plant species abundances.
The article uses data of two different experiment:
- the Jena field survay (13 plant species) and
- the Dutch field survay (120 plant species).
The data is based on 3 lanes of High Throuput sequencing (Hiseq X PE 2*150bp) data. The Lab protocol is based on Genotyping By Sequencing (Elshire, 2011) but is adjusted in design as described in the manuscript and applied to mixed species samples instead of species pure material. Downstream analysis is not aimed to identify SNP's as in the original method but to quantify the relative abundance of the species present the the plant root mixture.
The data was processed by the msGBS pipeline (available at: GitHub (https://github.com/NielsWagemaker/scripts_msGBS/tree/msGBS-1.0).
A new, easy to install, snakemake pipelin is also available but not descripbed in the article (https://github.com/NielsWagemaker/scripts_msGBS/tree/msGBS-snake).
Included in this Dryad dataset are a Barcode file for demultiplexing the Raw sequence data available at NCBI Sequence read Archive (SRA; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA604964), the metareference genome and a raw .csv data analysis output file. Also included are several -in-between- analysis files as described in the article supplemental data.
In the Dryad overview file (overview_DOCS.pptx), the available files are annotated to the several bioinformatical processes of the pipeline in order to navigate through the data files.
Extended descriptions on bioinformatics are available in the supplemental data of the article.
Raw data of Figures and Tables of the Manuscript are available (RAW_DATA_Wagemaker_et_al_MER_2020.xlsx) as well as metareference genomes.