Data from: Resistance and resilience of soil microbiomes under climate change
Data files
Oct 31, 2024 version files 27.30 MB
-
ANCOM_all_results_model_output.csv
29.97 KB
-
Data_Network_analysis_significant_correlations.csv
5.87 KB
-
Data_Soil_temp_and_VWC.xlsx
4.60 MB
-
KSR20_21_SoilData_Auto20.csv
1.98 MB
-
KSR20_21_SoilData_Auto21v2.csv
2.68 MB
-
merged-rooted-tree-16S.qza
1.43 MB
-
merged-rooted-tree-ITS.qza
345.66 KB
-
merged-taxonomy-16S.qza
1.94 MB
-
merged-taxonomy-ITS-UNITE9.qza
456.24 KB
-
Plot_merged_sample_metadatas_2021.csv
886 B
-
Plot_merged_sample_metadatas_2022.csv
421 B
-
Plot_merged_sample_metadatas_multiyear.csv
1.64 KB
-
README.md
17.60 KB
-
Resistance_and_resilience_code.Rmd
207.35 KB
-
rooted-tree-16S-2022.qza
912.62 KB
-
rooted-tree-16S.qza
909.12 KB
-
rooted-tree-ITS-2022.qza
138.75 KB
-
rooted-tree-ITS.qza
236.29 KB
-
table-10readsmin-noplant-16S-2022.qza
1.01 MB
-
table-10readsmin-noplant-16S.qza
1.18 MB
-
table-10readsmin-noplant-ITS-UNITE9-2022.qza
200.50 KB
-
table-10readsmin-noplant-UNITE9-ITS-2021.qza
314.96 KB
-
table-16S-multiyear.qza
1.79 MB
-
table-UNITE9-ITS-multiyear.qza
549.04 KB
-
taxonomy-16S-2022.qza
1.22 MB
-
taxonomy-16S.qza
1.24 MB
-
taxonomy-ITS-UNITE9-2021.qza
310.21 KB
-
taxonomy-UNITE9-ITS-2022.qza
159.96 KB
-
taxonomy-UNITE9-ITS.taxa.guilds2021.csv
2.50 MB
-
taxonomy-UNITE9-ITS.taxa.guilds2022.csv
917.40 KB
-
warmingsoil_metadata_2022.tsv
1.41 KB
-
warmingsoil_metadata_combinedyears.tsv
5.91 KB
-
warmingsoil_metadata.tsv
4.08 KB
Abstract
Soil microbiomes play key roles in plant productivity and nutrient cycling, and we need to understand whether and how they will withstand the effects of global climate change. We exposed in situ soil microbial communities to multiple rounds of heat, drought, or both treatments, and profiled microbial communities with 16S rRNA and ITS amplicon sequencing during and after these climatic changes, and then tested how domain and symbiotic lifestyle affected responses. Fungal community composition strongly shifted due to drought and its legacy. In contrast, bacterial community composition resisted change during the experiment, but still was affected by the legacy of drought. We identified fungal and bacterial taxa with differential abundance due to heat and drought and found that taxa affected during climate events are not necessarily the taxa affected in recovery periods, showing the complexity and importance of legacy effects. Additionally, we found evidence that symbiotic groups of microbes important to plant performance respond in diverse ways to climate treatments and their legacy, suggesting plants may be impacted by past climatic events like drought and warming even if they do not experience the event themselves.
This dataset includes the sequencing data and metadata from soil warming arrays that experienced drought, heat, both, or control conditions.
We sequenced the 16S V4 region for bacteria and the ITS region for fungi.
Sequencing data was first assembled, classified into Amplicon Sequence Variants (ASVs), and assigned taxonomic classification using QIIME2.
The ASV tables, taxonomy files, and rooted trees were obtained from this preliminary step in QIIME2 and this is what was used in downstream analysis in R. The metadata files are sometimes separated by time point, since we collected soil at three distinct times (2 sampling times in 2021, 1 in 2022), but there are also metadata files that contain all three timepoints.
An excel file containing all automatic and manual soil monitoring data for temperature and volumetric water content is included (“Data Soil temp and VWC”), as well as the individual saved csv files used in the analysis (csv files starting with KSR20_21_SoilData_Auto).
Some files are outputs of analyses. We classified fungi into functional guilds using FUNGuild to see if this affected microbial resistance and resilience, and the output files from that classification are included as .csv files with ‘guild’ in the name. We used a statistical method called ANCOM-BC to identify differentially abundant taxa due to heat and drought and by what amount they are affected; the model output data is included as a .csv file starting with ANCOM. We used a network analysis to determine significantly correlated genera, and the list of correlated genera is included here as “Data Network analysis significant correlations.csv”.
Description of the data and file structure
There are two main categories of data: Fungi reads and Bacteria reads. Fungi data is denoted by ITS, while bacteria data is denoted by 16S.
There are also three sub categories of data based on sample time: samples taken in 2021, samples taken in 2022, and multiyear (2021 and 2022 data merged together). Samples from 2022 are always marked ‘2022’, and multiyear data are always marked ‘multiyear’, ‘merged’, or ‘combinedyears’, however samples from 2021 either do not have a date in the name or have ‘2021’.
All .qza files are from QIIME2 and as seen in the code, can be used to make phyloseq objects when combined with the appropriate metadata files.
Metadata files are included for each timepoint.
Specific files and their descriptions
ITS/Fungi
2021
-
table-10readsmin-noplant-UNITE9 ITS-2021.qza : Feature table of fungal samples in 2021, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
rooted-tree-ITS.qza : Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.
-
taxonomy-ITS-UNITE9-2021.qza : Taxonomy file linking ASVs in 2021 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.
2022
-
table-10readsmin-noplant-ITS UNITE9-2022.qza :Feature table of fungal samples in 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
rooted-tree-ITS-2022.qza :Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.
-
taxonomy-UNITE9-ITS-2022.qza :Taxonomy file linking ASVs in 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.
2021 and 2022 merged
-
table-UNITE9-ITS-multiyear.qza :Feature table of concatenated fungal samples in 2021 and 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
merged-rooted-tree-ITS.qza :Sequence tree made in QIIME2 based on ITS sequences. Not used for any analyses because inference based on ITS is too rough and may be erroneous.
-
merged-taxonomy-ITS-UNITE9.qza :Taxonomy file linking ASVs in 2021 and 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using UNITE9 fungi classifying database. Created using QIIME2.
16S/Bacteria
2021
-
table-10readsmin-noplant-16S.qza :Feature table of bacterial samples in 2021, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
rooted-tree-16S.qza: Sequence tree made in QIIME2 based on 16S sequences, for samples in 2021.
-
taxonomy-16S.qza : Taxonomy file linking ASVs in 2021 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.
2022
-
table-10readsmin-noplant-16S 2022.qza :Feature table of bacterial samples in 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
rooted-tree-16S-2022.qza :Sequence tree made in QIIME2 based on 16S sequences, for samples in 2022.
-
taxonomy-16S-2022.qza :Taxonomy file linking ASVs in 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.
2021 and 2022 merged
-
table-16S-multiyear.qza :Feature table of concatenated bacterial samples in 2021 and 2022, where rows are ASVs and columns are sample IDs (corresponding to metadata ‘sample-id’ columns). This is the resulting table after using QIIME2 to remove chloroplasts and mitochondria, and filter out ASVs that had fewer than 10 reads across all samples.
-
merged-rooted-tree-16S.qza: Sequence tree made in QIIME2 based on 16S sequences, with concatenated samples from both years.
-
merged-taxonomy-16S.qza : Taxonomy file linking ASVs in 2021 and 2022 samples with assigned taxonomy (Kingdom; Phylum; Order; Class; Family; Genus; Species) using GreenGenes classifying database. Created using QIIME2.
Metadata
-
warmingsoil_metadata.tsv :Metadata information of plots in 2021, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).
-
warmingsoil_metadata_2022.tsv : Metadata information of plots in 2022, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (MONTH-DAY-YYYY). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).
-
warmingsoil_metadata_combinedyears.tsv : Concatenated metadata information of plots in 2021 and 2022, with separate lines for each of the 3 subsamples of each plot. The sample-id column shows a unique ID for the soil sample. Time.point column indicates when soil was sampled (YYYY-MONTH-DD). Plot indicates the array plot sampled, replicate indicates which subsample, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (Yes) or not (No).
-
Plot_merged_sample_metadatas_2021.csv : Metadata information of plots in 2021, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows a letter J (June) or S (September) appended by the plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).
-
Plot_merged_sample_metadatas_2022.csv : Metadata information of plots in 2022, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows the plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).
-
Plot_merged_sample_metadatas_multiyear.csv : Metadata information of plots in 2021 and 2022, but with the 3 subsamples of each plot merged into observation (used after steps in R .rmd). The sample-id column shows the sampling date followed by plot number. Time.point column indicates when soil was sampled (MONTH-DD-YYYY). Plot indicates the array plot sampled, and the treatment column shows what treatment was applied. Drought_applied and Heat_applied columns indicate whether drought or heat respectively was applied (1) or not (0).
Analysis and analysis output files
-
Resistance_and_resilience_code.Rmd : R markdown file containing all the code needed to reproduce the analysis. Open using R.
-
Data_Network_analysis_significant_correlations.csv : Each row contains two genera that were significantly correlated in the network analysis. Treatment column represents the active treatment the soil sample received. Var1 and Var2 represent the genera names, with the cor column showing their correlation. The p column is the significance p value, while the fdr column is the p value corrected for false discovery rate. Var1domain and Var2domain are the domain that respective Var1/2 genus is from (Bacteria or Fungi).
- ANCOM_all_results_modeloutput.csv : This is the output from my ANCOM analysis in the .rmd file, showing only genera that were significantly differentially abundant at one or more time points. Each row describes when a genus was differentially abundant at one time point. The Taxa column contains genus name. The Coef_Heat_Yes, Coef_Drought_Yes, and Coef_Heat_Yes:Drought_Yes represent the log fold change of that genus in applied heat, drought, or the interaction of heat and drought respectively. Indeed, for all columns, columns denoted with Heat_Yes mean heat was applied, columns with Drought_Yes mean drought was applied, and the Heat_Yes:Drought_Yes represents the interaction of heat and drought. The SE columns show the standard error around that coefficient estimate. The TestStats columns shows this analysis’s version of a ‘t’ value or ‘F value’ in a statistical test, where its magnitude determines significance. The p column represents the raw p value, whereas padjs columns show the p value corrected with the Holm’s method correction. Differential_expr columns state whether that genus is significantly differentially expressed due to heat, drought, or heat*drought interaction, depending on the appended label; True= it is differentially expressed, False= it is not. The Time column shows the time period being considered in that row. The Kingdom column tells us what Kingdom/domain the genus belongs to.
- taxonomy-UNITE9 ITS.taxa.guilds2021.csv : Output of FunGuild, which matches the taxonomy of an ASV with known functional guild data about that taxa. Results for ASVs in 2021. OTU column contains ASV name. The Kingdom, Phylum, Order, Class, Family, Genus, and Species columns are the taxonomic classifications for that ASV from the qiime2 taxonomy files listed above; ‘na’s in these columns indicate that taxonomic resolution could not go finer based on the sequence. The rest of the columns are output of FunGuild; if FunGuild could not match/assign functional guild data, then the following columns are ‘na’, but if a match was made and there’s missing or unknown information for a column, then this is denoted with ‘NULL’. The ‘taxon’ column has a value when FunGuild has matched that ASV’s taxonomy to a known taxa in its database (this column states the matching taxa name). TaxonomicLevel column describes at which taxonomic resolution it was matched in the database. TrophicMode describes the general trophic strategy of the taxa. Guild describes the functional guild of the taxa. Trait describes any extra known traits about the taxa. The growthForm column describes known growth forms of the taxa. Confidence is the level at which the database is certain about the metadata/guild info associated with the taxa. Notes and citationSource columns are also generated by FunGuild to give extra information and the source for the guild classifications made by the database.
- taxonomy-UNITE9 ITS.taxa.guilds2022.csv : Output of FunGuild, which matches the taxonomy of an ASV with known functional guild data about that taxa. Results for ASVs in 2022. Same description as file “taxonomy-UNITE9 ITS.taxa.guilds2021.csv”
Soil monitoring data
-
Data_Soil_temp_and_VWC.xlsx :This file contains both manual and automatic temperature and volumetric water content measurements, for completeness. Each plot was equipped with a soil sensor (Models 5TM and TE11, METER Group, Pullman, WA, USA) in the center at a depth of 15-20cm to record soil volumetric water content (VWC) and soil temperature every 30 minutes using a CR1000 datalogger (Campbell Scientific Inc., Edmonton, AB, Canada). Several manual soil VWC measurements were taken during the experiments in 2020 and 2021 using a HydroSense system with the CS620 sensor (Campbell Scientific Inc., Edmonton, AB, Canada) at a depth of 20cm to match the depth of the automatic soil sensors, with seven manual soil VWC measurement time points in 2020 and eight time points in 2021. For each year, a correction factor was calculated per plot to correct for sensor drift and aging of the 5TM and TE11 probes that were permanently installed in the experimental plots. The correction factors were calculated as the ratio between the manual soil VWC means and the single automatic soil VWC sensor reading at multiple time points per experiment. The automatic VWC sensor readings were then multiplied by the calculated correction factor across the full sensor datasets. At each time point, we measured soil VWC at seven points within each plot and calculated a mean value for each plot. The “Automatic Soil Data for 2020” and “Automatic Soil Data for 2021” tabs show automatic measurements, with the date (YYYY-MM-DD) and time columns indicating when measurements were taken. Soil temperature was measured in degrees Celcius, and columns with SoilTemp are appended with the plot number (1-12). Drought= plots 1,5,10, Heat=plots 7,11,12, Control= plots 3,4,8, Heat+Drought= plots 2,6,9. Columns with VWC denote volumetric water content (units in m^3/m^3) of the soil, the number (1-12) denotes plot; columns with Raw appended are the raw values, while columns with Corr are corrected values based on manual measurements, as described above. The tabs “Manual Soil VWC for 2020” and “Manual Soil VWC for 2021” represent manual water content measurements. Here, the date is indicated in a column (YYYY-MM-DD), and there is a column for plot. The columns with Manual are the VWC measurement (units in m^3/m^3), but the number appended is the replicate of the measurement (7 measurements for each plot). The Manual_Mean column shows the mean of the 7 measurements, while the automatic measurement at the same time as the manual measurements is shown in the Automatic column.
-
KSR20_21_SoilData_Auto20.csv : This is the tab “Automatic Soil Data for 2020” from the Data_Soil_temp_and_VWC.xlsx file. Saved as a .csv file for analysis in R.
-
KSR20_21_SoilData_Auto21v2.csv :This is the tab “Automatic Soil Data for 2021” from the Data_Soil_temp_and_VWC.xlsx file. Saved as a .csv file for analysis in R.
Sharing/Access information
Raw sequence available at NCBI’s Short Read Archive PRJNA1177093 : Resistance and resilience of soil microbiomes under climate change.
Links to other publicly accessible locations of the data kept on Dryad:
Code/Software
There is a .rmd (R markdown) file that contains all code required to run the analyses and create figures.
I mainly used R and R Studio to run analysis.
FUNGuild analysis partially used bash and python (code included in the .rmd file), however the output files from this analysis are included as guilds.csv files.
We conducted this study at the Koffler Scientific Reserve (KSR, www.ksr.utoronto.ca) in Ontario, Canada (latitude 44°01'48”N, longitude 79°32'01”W) in an old field environment.
We collected soil from a temperature-free-air-controlled enhancement experiment that had plots which were droughted, heated, droughted and heated, or left control. There were two periods of active treatment: rainout structures were present July-November 2020, and June-October 2021, for a total of 8 months; heaters were active in heated plots August-December 2020, and August 2021-January 2022, for a total of 9 months. There were three replicate plots per climate treatment, and we took three subsamples of soil per plot, per timepoint. Soil was sampled in June 2021 (first recovery period), September 2021 (active treatment), and June 2022 (second recovery period).
We extracted microbial DNA from the soil, and performed 16S rRNA amplicon sequencing on the conserved hypervariable V4 region (primer pair 515F-806R) and ITS region (primer pair ITS1FP2-58A2RP3). Raw data can be found on NCBI's Short Read Archive under PRJNA1177093. Reads from sub-samples of plots were then merged such that each plot had one set of reads per timepoint.
In QIIME2, we removed ASVs that had fewer than 10 reads across all samples, and assigned taxonomy using the ‘sklearn’ feature classifier (Pedregosa et al. 2011); we used Greengenes 16S V4 region reference for bacteria (McDonald et al. 2012), and UNITE (Nilsson et al. 2019) version 9.0 with dynamic clustering of global and 97% singletons for fungi. We then filtered out reads assigned as cyanobacteria and mitochondria to remove plant and animal DNA. Finally, for each of the bacterial and fungal datasets we constructed a phylogeny using QIIME2’s MAFFT (Katoh & Standley 2013) and FastTree (Price et al. 2010) functions to obtain a rooted tree.
The .qza files were used downstream in the included R code.
Metadata files (.csv) can be opened in excel.
QIIME2 files (.qza) can be opened in a QIIME2 environment in command line, or in R following the .rmd code.
Analysis (.rmd) can be opened in R and R Studio.