Data from: Resistance and resilience of soil microbiomes under climate change

Boyle, Julia 1 ; Murphy, Bridget1 ; Ensminger, Ingo1 ; Stinchcombe, John 1 ; Frederickson, Megan1

Published Oct 31, 2024 on Dryad. https://doi.org/10.5061/dryad.dbrv15f6r

Abstract

Soil microbiomes play key roles in plant productivity and nutrient cycling, and we need to understand whether and how they will withstand the effects of global climate change. We exposed in situ soil microbial communities to multiple rounds of heat, drought, or both treatments, and profiled microbial communities with 16S rRNA and ITS amplicon sequencing during and after these climatic changes, and then tested how domain and symbiotic lifestyle affected responses. Fungal community composition strongly shifted due to drought and its legacy. In contrast, bacterial community composition resisted change during the experiment, but still was affected by the legacy of drought. We identified fungal and bacterial taxa with differential abundance due to heat and drought and found that taxa affected during climate events are not necessarily the taxa affected in recovery periods, showing the complexity and importance of legacy effects. Additionally, we found evidence that symbiotic groups of microbes important to plant performance respond in diverse ways to climate treatments and their legacy, suggesting plants may be impacted by past climatic events like drought and warming even if they do not experience the event themselves.

This dataset includes the sequencing data and metadata from soil warming arrays that experienced drought, heat, both, or control conditions.
We sequenced the 16S V4 region for bacteria and the ITS region for fungi.
Sequencing data was first assembled, classified into Amplicon Sequence Variants (ASVs), and assigned taxonomic classification using QIIME2.
The ASV tables, taxonomy files, and rooted trees were obtained from this preliminary step in QIIME2 and this is what was used in downstream analysis in R. The metadata files are sometimes separated by time point, since we collected soil at three distinct times (2 sampling times in 2021, 1 in 2022), but there are also metadata files that contain all three timepoints.

An excel file containing all automatic and manual soil monitoring data for temperature and volumetric water content is included ("Data Soil temp and VWC"), as well as the individual saved csv files used in the analysis (csv files starting with KSR20_21_SoilData_Auto).

Some files are outputs of analyses. We classified fungi into functional guilds using FUNGuild to see if this affected microbial resistance and resilience, and the output files from that classification are included as .csv files with 'guild' in the name. We used a statistical method called ANCOM-BC to identify differentially abundant taxa due to heat and drought and by what amount they are affected; the model output data is included as a .csv file starting with ANCOM. We used a network analysis to determine significantly correlated genera, and the list of correlated genera is included here as "Data Network analysis significant correlations.csv".

Description of the data and file structure

There are two main categories of data: Fungi reads and Bacteria reads. Fungi data is denoted by ITS, while bacteria data is denoted by 16S.
There are also three sub categories of data based on sample time: samples taken in 2021, samples taken in 2022, and multiyear (2021 and 2022 data merged together). Samples from 2022 are always marked '2022', and multiyear data are always marked 'multiyear', 'merged', or 'combinedyears', however samples from 2021 either do not have a date in the name or have '2021'.

All .qza files are from QIIME2 and as seen in the code, can be used to make phyloseq objects when combined with the appropriate metadata files.
Metadata files are included for each timepoint.

Specific files and their descriptions

ITS/Fungi

2021

table-10readsmin-noplant-UNITE9 ITS-2021.qza : Feature table of fungal samples in 2021, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

rooted-tree-ITS.qza : Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.

taxonomy-ITS-UNITE9-2021.qza : Taxonomy file linking ASVs in 2021 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.

2022

table-10readsmin-noplant-ITS UNITE9-2022.qza :Feature table of fungal samples in 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

rooted-tree-ITS-2022.qza :Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.

taxonomy-UNITE9-ITS-2022.qza :Taxonomy file linking ASVs in 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.

2021 and 2022 merged

table-UNITE9-ITS-multiyear.qza :Feature table of concatenated fungal samples in 2021 and 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

merged-rooted-tree-ITS.qza :Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.

merged-taxonomy-ITS-UNITE9.qza :Taxonomy file linking ASVs in 2021 and 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.

16S/Bacteria

2021

table-10readsmin-noplant-16S.qza :Feature table of bacterial samples in 2021, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

rooted-tree-16S.qza: Sequence tree made in QIIME2 based on 16S sequences, for samples in 2021.

taxonomy-16S.qza : Taxonomy file linking ASVs in 2021 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.

2022

table-10readsmin-noplant-16S 2022.qza :Feature table of bacterial samples in 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

rooted-tree-16S-2022.qza :Sequence tree made in QIIME2 based on 16S sequences, for samples in 2022.

taxonomy-16S-2022.qza :Taxonomy file linking ASVs in 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.

2021 and 2022 merged

table-16S-multiyear.qza :Feature table of concatenated bacterial samples in 2021 and 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata 'sample-id' columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.

merged-rooted-tree-16S.qza: Sequence tree made in QIIME2 based on 16S sequences, with concatenated samples from both years.

merged-taxonomy-16S.qza : Taxonomy file linking ASVs in 2021 and 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.

Metadata

warmingsoil_metadata.tsv :Metadata information of plots in 2021, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).

warmingsoil_metadata_2022.tsv : Metadata information of plots in 2022, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (MONTH-DAY-YYYY). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).

warmingsoil_metadata_combinedyears.tsv : Concatenated metadata information of plots in 2021 and 2022, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (YYYY-MONTH-DD). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).

Plot_merged_sample_metadatas_2021.csv : Metadata information of plots in 2021, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows a letter J (June) or S (September) appended by the plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).

Plot_merged_sample_metadatas_2022.csv : Metadata information of plots in 2022, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows the plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).

Plot_merged_sample_metadatas_multiyear.csv : Metadata information of plots in 2021 and 2022, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows the sampling date followed by plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).

Analysis and analysis output files

Resistance_and_resilience_code.Rmd : R markdown file containing all the code needed to reproduce the analysis. Open using R.

Data_Network_analysis_significant_correlations.csv : Each row contains two genera that were significantly correlated in the network analysis. Treatment column represents the active treatment the soil sample received. Var1 and Var2 represent the genera names, with the cor column showing their correlation. The p column is the significance p value, while the fdr column is the p value corrected for false discovery rate. Var1domain and Var2domain are the domain that respective Var1/2 genus is from (Bacteria or Fungi).

ANCOM_all_results_modeloutput.csv : This is the output from my ANCOM analysis in the .rmd file, showing only genera that were significantly differentially abundant at one or more time points. Each row describes when a genus was differentially abundant at one time point. The Taxa column contains genus name. The Coef_Heat_Yes, Coef_Drought_Yes, and Coef_Heat_Yes:Drought_Yes represent the log fold change of that genus in applied heat, drought, or the interaction of heat and drought respectively. Indeed, for all columns, columns denoted with Heat_Yes mean heat was applied, columns with Drought_Yes mean drought was applied, and the Heat_Yes:Drought_Yes represents the interaction of heat and drought. The SE columns show the standard error around that coefficient estimate. The TestStats columns shows this analysis's version of a 't' value or 'F value' in a statistical test, where its magnitude determines significance. The p column represents the raw p value, whereas padjs columns show the p value corrected with the Holm's method correction. Differential_expr columns state whether that genus is significantly differentially expressed due to heat, drought, or heat*drought interaction, depending on the appended label; True= it is differentially expressed, False= it is not. The Time column shows the time period being considered in that row. The Kingdom column tells us what Kingdom/domain the genus belongs to.
taxonomy-UNITE9 ITS.taxa.guilds2021.csv : Output of FunGuild, which matches the taxonomy of an ASV with known functional guild data about that taxa. Results for ASVs in 2021. OTU column contains ASV name. The Kingdom, Phylum, Order, Class, Family, Genus, and Species columns are the taxonomic classifications for that ASV from the qiime2 taxonomy files listed above; 'na's in these columns indicate that taxonomic resolution could not go finer based on the sequence. The rest of the columns are output of FunGuild; if FunGuild could not match/assign functional guild data, then the following columns are 'na', but if a match was made and there's missing or unknown information for a column, then this is denoted with 'NULL'. The 'taxon' column has a value when FunGuild has matched that ASV's taxonomy to a known taxa in its database (this column states the matching taxa name). TaxonomicLevel column describes at which taxonomic resolution it was matched in the database. TrophicMode describes the general trophic strategy of the taxa. Guild describes the functional guild of the taxa. Trait describes any extra known traits about the taxa. The growthForm column describes known growth forms of the taxa. Confidence is the level at which the database is certain about the metadata/guild info associated with the taxa. Notes and citationSource columns are also generated by FunGuild to give extra information and the source for the guild classifications made by the database.
taxonomy-UNITE9 ITS.taxa.guilds2022.csv : Output of FunGuild, which matches the taxonomy of an ASV with known functional guild data about that taxa. Results for ASVs in 2022. Same description as file "taxonomy-UNITE9 ITS.taxa.guilds2021.csv"

Soil monitoring data

Data_Soil_temp_and_VWC.xlsx :This file contains both manual and automatic temperature and volumetric water content measurements, for completeness. Each plot was equipped with a soil sensor (Models 5TM and TE11, METER Group, Pullman, WA, USA) in the center at a depth of 15-20cm to record soil volumetric water content (VWC) and soil temperature every 30 minutes using a CR1000 datalogger (Campbell Scientific Inc., Edmonton, AB, Canada). Several manual soil VWC measurements were taken during the experiments in 2020 and 2021 using a HydroSense system with the CS620 sensor (Campbell Scientific Inc., Edmonton, AB, Canada) at a depth of 20cm to match the depth of the automatic soil sensors, with seven manual soil VWC measurement time points in 2020 and eight time points in 2021. For each year, a correction factor was calculated per plot to correct for sensor drift and aging of the 5TM and TE11 probes that were permanently installed in the experimental plots. The correction factors were calculated as the ratio between the manual soil VWC means and the single automatic soil VWC sensor reading at multiple time points per experiment. The automatic VWC sensor readings were then multiplied by the calculated correction factor across the full sensor datasets. At each time point, we measured soil VWC at seven points within each plot and calculated a mean value for each plot. The "Automatic Soil Data for 2020" and "Automatic Soil Data for 2021" tabs show automatic measurements, with the date (YYYY-MM-DD) and time columns indicating when measurements were taken. Soil temperature was measured in degrees Celcius, and columns with SoilTemp are appended with the plot number (1-12). Drought= plots 1,5,10, Heat=plots 7,11,12, Control= plots 3,4,8, Heat+Drought= plots 2,6,9. Columns with VWC denote volumetric water content (units in m^3/m3) of the soil, the number (1-12) denotes plot; columns with Raw appended are the raw values, while columns with Corr are corrected values based on manual measurements, as described above. The tabs "Manual Soil VWC for 2020" and "Manual Soil VWC for 2021" represent manual water content measurements. Here, the date is indicated in a column (YYYY-MM-DD), and there is a column for plot. The columns with Manual are the VWC measurement (units in m^3/m3), but the number appended is the replicate of the measurement (7 measurements for each plot). The Manual_Mean column shows the mean of the 7 measurements, while the automatic measurement at the same time as the manual measurements is shown in the Automatic column.

KSR20_21_SoilData_Auto20.csv : This is the tab "Automatic Soil Data for 2020" from the Data_Soil_temp_and_VWC.xlsx file. Saved as a .csv file for analysis in R.

KSR20_21_SoilData_Auto21v2.csv :This is the tab "Automatic Soil Data for 2021" from the Data_Soil_temp_and_VWC.xlsx file. Saved as a .csv file for analysis in R.

Sharing/Access information

Raw sequence available at NCBI's Short Read Archive PRJNA1177093 : Resistance and resilience of soil microbiomes under climate change.

Links to other publicly accessible locations of the data kept on Dryad:

https://github.com/JuliaBoyle/resistance-resilience

Code/Software

There is a .rmd (R markdown) file that contains all code required to run the analyses and create figures.
I mainly used R and R Studio to run analysis.
FUNGuild analysis partially used bash and python (code included in the .rmd file), however the output files from this analysis are included as guilds.csv files.

Data from: Resistance and resilience of soil microbiomes under climate change

Data files

Abstract

README: Resistance and resiliency of soil microbiomes under climate change

Description of the data and file structure

Specific files and their descriptions

ITS/Fungi

2021

2022

2021 and 2022 merged

16S/Bacteria

2021

2022

2021 and 2022 merged

Metadata

Analysis and analysis output files

Soil monitoring data

Sharing/Access information

Code/Software

Methods

Usage notes

Works referencing this dataset