NGS amplicon metagenomic 16S seq of soybean rhizosphere under contrasting nutrient-deficient and acidic-stress soils
Data files
Mar 17, 2025 version files 1.23 MB
-
abundance_genus.csv
5.05 KB
-
abundance_phylum.csv
582 B
-
AG.txt
76.30 KB
-
AK.txt
76.49 KB
-
alpha.txt
270 B
-
dada2_stat.txt
217 B
-
expr.asv.fasta
376.91 KB
-
NC.txt
76.31 KB
-
NM.txt
76.28 KB
-
r-packages.txt
532 B
-
README.md
13.16 KB
-
seq_quality.pdf
337.43 KB
-
seqs_errorplot.pdf
82.07 KB
-
shared_and_unique_taxa_venn01.txt
9.19 KB
-
shared_and_unique_taxa_venn02.txt
9.31 KB
-
tax_assignment.csv
90.50 KB
-
top10.relative_abundance_genus_group.csv
286 B
-
top10.relative_abundance_phylum_group.csv
296 B
Abstract
Acidic and nutrient stress conditions are key limiting factors affecting the low soybean productivity and sustainability in Indonesia. They are closely associated with the structure and diversity of bacterial communities in the rhizosphere, which play a crucial role in plant health and productivity. This study aims to deeply explore the diversity and structure of bacterial communities in the rhizosphere under acidic stress and nutrient-deficient conditions, which are essential for rhizomicrobiome engineering to enhance soybean productivity. The investigation using a metagenomic approach was conducted in soybean rhizospheres under two contrasting abiotic stress conditions: highly acidic and nutrient-deficient soil, and slightly acidic to neutral soil with moderate fertility. High-throughput next-generation sequencing of 16S rRNA gene amplicons was performed to profile microbial diversity and community composition across different pH stress gradients. The findings demonstrate that soil acidity and nutrient deficiency significantly influence the structure and diversity of bacterial communities in the soybean rhizosphere. Acidic stress alters microbial composition, increasing the relative abundance of Acidobacteriota and Patescibacteria, which are well adapted to low pH conditions while reducing Verrucomicrobiota and Myxococcota, which are more sensitive to acidic environments. Alpha diversity analysis revealed greater microbial richness and evenness in acidic soils, whereas beta diversity metrics indicated distinct clustering patterns associated with soil pH levels. Heatmap analysis showed that Chloroflexi were most abundant in acidic soils, whereas Myxococcota predominated in non-acidic soils. Functional predictions suggest an upregulation of genes associated with acid resistance, nutrient cycling, and stress adaptation in acidic soils, highlighting the potential role of acid-tolerant bacterial taxa in promoting sustainable soybean cultivation. These findings contribute to a deeper understanding of the interactions between soil acidity, nutrient availability, and microbial ecology, providing a foundation for microbial-based strategies to enhance crop resilience in acidic and nutrient-deficient environments.
https://doi.org/10.5061/dryad.tqjq2bw98
Description of the data and file structure
The soil sampling was obtained from acidic and non-acidic areas. Bacterial communities from soil samples were analyzed using next-generation sequencing (NGS) of 16S rRNA genes on an Illumina Miseq platform (Singapore). The total gDNA from soil samples was extracted using a Magnetic Soil and Stool DNA Kit (TianGen, China, Catalog #: DP712). gDNA samples were amplified with target-specific primer (16S V3-V4). All PCR reactions were carried out with 15 µL of Phusion® High-Fidelity PCR Master Mix (New England Biolabs). the mixture of PCR products was purified with a Universal DNA Purification Kit (TianGen, China, Catalog #: DP214). Libraries were sequenced on a paired-end Illumina platform to generate 250 bp paired-end raw reads. The library was checked with Qubit and real-time PCR for quantification, while a bioanalyzer was used for size distribution detection. Quantified libraries were pooled and sequenced on Illumina platforms. Adapter and PCR primer sequences from the paired-end reads were removed using Cutadapt (Bellemain et al., 2010). DADA2 was used to correct sequencing errors and remove low-quality sequences and chimera errors. The resulting ASV data was used for taxonomic classification against the SILVA (silva_nr99_v138.1) (16S) database.
Files and variables
File: dada2_stat.txt
Description: an output from the DADA2 pipeline, a tool used for processing and analyzing amplicon sequencing data, particularly 16S. This file contains quality control and filtering statistics related to the sequence processing steps. The file has a tab-separated format with the following columns:
- SampleID– Name of the sample (AG, AK, NC, NM)..
- Trimmed – Number of reads remaining after trimming (removal of primers, adapters, or low-quality bases).
- Filtered – Number of reads passing the quality filtering step
- DenoisedF - Number of reads that passed the denoising step for forward reads.
- DenoisedR – Number of reads that passed the denoising step for reverse reads.
- Merged – Number of successfully merged forward and reverse reads (if paired-end).
- Nonchim – Number of non-chimeric reads (final usable reads for analysis)
File: seq_quality.pdf
Description: sequence quality report.
- Filtering Step:
- A significant number of reads are lost at the filtering step (~45–50% retained).
- Sample AG has the lowest retention (~49.4%), while NM has the highest (~55.5%).
- Denoising Step:
- NM retains the highest proportion (~52% of trimmed reads), suggesting higher data quality.
- Merging Efficiency:
- AG has the lowest merging rate (~13.7%), while NM has the highest (~30.1%).
- Final Non-Chimeric Reads:
- The proportion of final non-chimeric reads ranges from 13.0% (AG) to 29.1% (AK).
File: seqs_errorplot.pdf
Description: provides error rate plots for different nucleotide transitions in the sequencing data, generated by DADA2. These plots compare observed error rates to expected values, helping to evaluate sequencing quality and error modeling.
The graph displays:
- X-axis: Consensus quality score (how confident the sequencing machine is about each base call).
- Y-axis (log scale): Error frequency (how often a base is miscalled).
- Colored Lines: Different types of nucleotide transitions (e.g., T→A, G→C, etc.).
Each plot represents different substitutions, including:
- T2A, T2C, T2G, T2T (Thymine miscalls)
- G2A, G2C, G2G, G2T (Guanine miscalls)
- C2A, C2C, C2G, C2T (Cytosine miscalls)
- A2A, A2C, A2G, A2T (Adenine miscalls)
File: expr.asv.fasta
Description: DNA sequences from ASV1 to ASV1320.
File: tax_assignment.csv
Description: contains taxonomic classifications for ASVs from the DADA2 pipeline.
- Total ASVs: 875
- Columns Detected: The dataset appears to have sample counts and taxonomic classifications in a tab-separated format
- Taxonomic Levels Included:
- ASV ID
- Sample Abundances (AG, AK, NC, NM)
- Kingdom → Phylum → Class → Order → Family → Genus → Species
File: abundance_genus.csv
Description: the dataset contains genus-level abundance data for microbiome samples.
- Total Genera Detected: 213
- Columns Detected: The dataset contains genus names and abundance counts for samples (AG, AK, NC, NM).
- Most Abundant Genera (Top 10 by Total Counts Across Samples):
- Acidothermus – High in AG (956) and AK (3185).
- Acidibacter – Particularly high in NM (2582).
- Acidicaldus, Acinetobacter, Actinoallomurus – Found in small amounts in select samples.
File: abundance_phylum.csv
Description: the dataset contains microbiome abundance data at the phylum level across four sample groups (AG, AK, NC, NM). Phylum column lists different bacterial phyla. AG, AK, NC, NM columns contain abundance values for each phylum in different sample groups.
- The dataset contains 20 bacterial phyla with abundance values across four sample groups (AG, AK, NC, NM).
- The mean abundance per phylum varies, with NM having the highest average abundance (1123.75) and AG the lowest (392.2).
- NM (22,475) and AK (20,031) have the highest total abundance.
- AG (7,844) and NC (10,069) have lower total abundance, suggesting differences in microbial composition.
- AG, AK, and NM: The most abundant phylum is Proteobacteria.
- NC: The most abundant phylum is Verrucomicrobiota.
File: top10.relative_abundance_phylum_group.csv
Description: The dataset contains relative abundances of the top 10 bacterial phyla across two groups: AS (Group 1) and NS (Group 2). The values represent the proportion of each phylum within the microbiome community.
- Dominant Phyla: Actinobacteriota is the most abundant phylum in both groups (28.87% in AS, 30.11% in NS). Chloroflexi is highly abundant in AS (16.84%) but very low in NS (1.53%), suggesting a significant difference between the two groups.
- Phyla with Similar Abundance Across Groups: Actinobacteriota (AS: 28.87%, NS: 30.11%) shows only a minor difference. Bacteroidota (AS: 1.67%, NS: 3.27%) has slightly higher representation in NS.
- Phyla with Higher Abundance in AS: Chloroflexi (AS: 16.84%, NS: 1.53%) shows a strong preference for AS. Acidobacteriota (AS: 5.41%, NS: 3.09%) is also more abundant in AS.
- Phyla with Higher Abundance in NS: Bacteroidota (NS: 3.27% vs. AS: 1.67%) and Firmicutes (NS: 2.06% vs. AS: 5.12%).
File: top10.relative_abundance_genus_group.csv
Description: The dataset contains relative abundances of the top 10 bacterial genera across two groups: AS (Group 1) and NS (Group 2). The values represent the proportion of each genus within the microbiome community.
- Genera with Strong Differences Between Groups: Acidothermus is highly abundant in AS (54.93%) but almost absent in NS (0.42%). Actinomadura and Candidatus Udaeobacter are much more abundant in NS than in AS. Acidibacter is significantly more abundant in NS (15.67%) compared to AS (3.97%).
- Genera More Abundant in AS: Acidothermus (AS: 54.93%, NS: 0.42%) shows a drastic difference. Bradyrhizobium (AS: 10.08%, NS: 1.87%) is also more dominant in AS.
- Genera More Abundant in NS: Actinomadura (AS: 1.63%, NS: 21.17%) is strongly enriched in NS. Candidatus Udaeobacter (AS: 2.68%, NS: 21.87%) follows the same trend.
File: alpha.txt
Description: The file contains alpha diversity metrics for four groups (AG, AK, NC, NM). These metrics include:
- Observed – The number of unique species (richness).
- Shannon Index – Accounts for both richness and evenness of species.
- Simpson Index – Measures dominance (values closer to 1 indicate more even communities).
- Inverse Simpson Index – The effective number of equally abundant species.
- Species Richness (Observed): AK has the highest richness (346), followed by NC (219), AG (197), and NM (148). This indicates AK has the most diverse microbiome, while NM has the lowest species richness.
- Shannon Index (Diversity & Evenness): Highest in AK (5.39), indicating a well-distributed microbiome. NM (3.46) has the lowest, suggesting lower evenness or dominance by a few species.
- Simpson Index (Dominance Measure): AK (0.993) and AG (0.989) have the highest values, indicating high evenness. NC (0.932) and NM (0.937) are lower, meaning these communities are more dominated by a few species.
- Inverse Simpson Index (Effective Species Count): AK (145.6) and AG (99.3) have the highest effective number of species. NC (14.6) and NM (15.9) have much lower diversity, reinforcing that their microbiomes are dominated by fewer species.
File: AG.txt
Description: dataset used for Krona visualizations. it displays hierarchical relationships in sample AG representing taxonomic levels (e.g., Kingdom → Phylum → Genus → Species).
File: AK.txt
Description: dataset used for Krona visualizations. it displays hierarchical relationships in sample AK representing taxonomic levels (e.g., Kingdom → Phylum → Genus → Species).
File: NC.txt
Description: dataset used for Krona visualizations. it displays hierarchical relationships in sample NC representing taxonomic levels (e.g., Kingdom → Phylum → Genus → Species).
File: NM.txt
Description: dataset used for Krona visualizations. it displays hierarchical relationships in sample NM representing taxonomic levels (e.g., Kingdom → Phylum → Genus → Species).
File: shared_and_unique_taxa_venn01.txt
Description: This dataset represents Amplicon Sequence Variants (ASVs) detected in two groups: AS and NS. It identifies: ASV unique to AS, ASV unique to NS, ASV shared between AS and NS.
- AS-Only ASVs: AS contains a large number of unique ASVs (e.g., ASV16, ASV17, ASV20, etc.). This suggests distinct microbial species in AS, potentially linked to environmental or biological factors.
- NS-Only ASVs: NS also has a distinct microbiome, with many unique ASVs (e.g., ASV1, ASV3, ASV4, etc.). This indicates that AS and NS have distinct microbial compositions, likely shaped by different selective pressures.
- Shared ASVs (AS and NS): Only 12 ASVs are common between AS and NS (e.g., ASV12, ASV13, ASV14, ASV76, ASV204, etc.). This low overlap suggests that the microbiomes of AS and NS have a high level of differentiation.
File: shared_and_unique_taxa_venn02.txt
Description: This dataset presents Amplicon Sequence Variants (ASVs) found in four different groups (AG, AK, NC, NM) and their shared or unique distribution.
- Unique ASVs per Group: AG has a large number of unique ASVs (e.g., ASV50, ASV72, ASV87). AK also has a distinct microbiome with many unique ASVs (e.g., ASV17, ASV26, ASV27). NC and NM have their own unique ASVs, suggesting distinct ecological pressures influencing microbiome composition.
- ASVs Shared Across Groups: Only one ASV (ASV13) is shared by all four groups (AG, AK, NC, NM), indicating a core microbial species present across all environments. ASV14 is shared by AG, AK, and NC, while ASV56 is shared by AG, AK, and NM. ASVs shared among three groups are rare, suggesting high microbiome differentiation.
- Pairwise Shared ASVs: Some ASVs are shared between specific pairs of groups, such as: AG & AK: ASV16, ASV20, ASV34, etc. NC & NM: ASV64, ASV224, ASV343. AK & NC: ASV381, ASV873. These overlaps suggest that microbiomes in certain conditions (e.g., untapped vs. tapped trees) share common species.
File: r-packages.txt
Description: This dataset lists R packages used for microbiome data analysis along with their documentation links.
Code/software
Downstream analysis and visualizations were performed using packages in RStudio (R version 4.2.3) (https://www.R-project.org/), Krona Tools (https://github.com/marbl/Krona), PICRUSt2 (https://github.com/picrust/picrust2). List of packages used in R analysis were dada2 (“https://benjjneb.github.io/dada2/”), ggplot2 (“https://ggplot2.tidyverse.org/”), ggpicrust2 (“https://github.com/cafferychen777/ggpicrust2”), MicEco (“https://github.com/Russel88/MicEco”), microbiomeMarker (“https://github.com/yiluheihei/microbiomeMarker”), microbiomeutilities (“https://microsud.github.io/microbiomeutilities/”), MicrobiotaProcess (“https://github.com/YuLab-SMU/MicrobiotaProcess”), phyloseq (“https://joey711.github.io/phyloseq/”), vegan (“https://github.com/vegandevs/vegan”).
Library Preparation & Sequencing
Bacterial communities from soil samples were analyzed using next-generation sequencing (NGS) of 16S rRNA genes on an Illumina Miseq platform (Singapore). The total gDNA from soil samples was extracted using a Magnetic Soil and Stool DNA Kit (TianGen, China, Catalog #: DP712). gDNA samples were amplified with target-specific primer (16S V3-V4). All PCR reactions were carried out with 15 µL of Phusion® High-Fidelity PCR Master Mix (New England Biolabs), 0.2 µM of forward and reverse primers, and about 10 ng template DNA. Thermal cycling consisted of initial denaturation at 98 ℃ for 1 min, followed by 30 cycles of denaturation at 98 ℃ for 10 s, annealing at 50 ℃ for 30 s, and elongation at 72 ℃ for 30 s and 72 ℃ for 5 min. Library preparation was performed using the final PCR products. The PCR products of proper size were selected through 2% agarose gel electrophoresis. PCR products were mixed in equidensity ratios. Then, the mixture of PCR products was purified with a Universal DNA Purification Kit (TianGen, China, Catalog #: DP214). The same amount of PCR products from each sample were pooled, end-repaired, A-tailed, and further ligated with Illumina adapters. Libraries were sequenced on a paired-end Illumina platform to generate 250 bp paired-end raw reads. The library was checked with Qubit and real-time PCR for quantification, while a bioanalyzer was used for size distribution detection. Quantified libraries were pooled and sequenced on Illumina platforms according to the effective library concentration and data amount required.
Data Processing and Analysis
Adapter and PCR primer sequences from the paired-end reads were removed using Cutadapt (Bellemain et al., 2010). DADA2 was used to correct sequencing errors and remove low-quality sequences and chimera errors (Martin et al., 2011). The resulting ASV data was used for taxonomic classification against the SILVA (silva_nr99_v138.1) (16S) database. Downstream analysis and visualizations were performed using packages in RStudio (R version 4.2.3) (https://www.R-project.org/), Krona Tools (https://github.com/marbl/Krona), PICRUSt2 (https://github.com/picrust/picrust2).