Bacteria living on and in leaves and roots influence many aspects of plant health, so the extent of a plant’s genetic control over its microbiota is of great interest to crop breeders and evolutionary biologists. Laboratory-based studies, because they poorly simulate true environmental heterogeneity, may misestimate or totally miss the influence of certain host genes on the microbiome. Here we report a large-scale field experiment to disentangle the effects of genotype, environment, age and year of harvest on bacterial communities associated with leaves and roots of Boechera stricta (Brassicaceae), a perennial wild mustard. Host genetic control of the microbiome is evident in leaves but not roots, and varies substantially among sites. Microbiome composition also shifts as plants age. Furthermore, a large proportion of leaf bacterial groups are shared with roots, suggesting inoculation from soil. Our results demonstrate how genotype-by-environment interactions contribute to the complexity of microbiome assembly in natural environments.
PRJEB10570
Study number PRJEB10570 at the European Nucleotide Archive contains the raw MiSeq reads from high-throughput sequencing of the 16S rRNA gene in hundreds of leaf, root, and soil samples. Note that some of these samples are associated with other experiments and were not analyzed for the Wagner et al. 2016 study.
16S rRNA gene copy number estimates
Estimates of 16S rRNA gene copy number for OTUs analyzed in the Wagner et al. 2016 study. Created by following Kembel et al. (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Computational Biology 8 (10), e1002743.
Eco_Field_copynumest_forR.txt
Glucosinolate data
Collection of glucosinolate data is described in Supplementary Note 1 of the article. HPLC peaks were called and area under each curve was integrated automatically by Agilent ChemStation software. Columns beginning with "area_" include the area under chromatogram peaks corresponding to the internal standard (Sinigrin) or one of the glucosinolate compounds (2OH1ME, 1ME, 6MSOH, 1MP, or I3M). "Weight" is the mass of air-dried samples, in mg. "Batch" is the identifier for groups of samples measured in the same HPLC run. "HPLC_ID" is a unique identifier for a glucosinolate sample, and "Plant_ID" is a unique identifier for individual plants in the field experiment.
Ecotypes_field_glucosinolates.txt
OTU phylogeny
Phylogeny of all OTUs (based on 97% 16S rRNA gene sequence identity). Generated from representative sequences using the midpoint method in QIIME.
phylogeny.tre
Field site coordinates
Latitude and longitude of 5 sites where genotypes were collected from natural B. stricta populations / where experimental common gardens were located.
site_coords_Ecotypes.txt
Soil / environmental data
Data on soil & vegetation characteristics from field sites used in this experiment. "PlantDiv" = number of plant morphospecies present in each block; “Veg” = percent of each block covered by vegetation (aggregate of estimated percent cover for 50 10x10cm sub-plots per block). Units are ppm (except for pH, unitless; and conductivity, umho/cm).
soildata.txt
Sample metadata
Microbiome sample metadata. "SampleID" = unique identifier for a sequenced 16S rDNA amplicon pool corresponding to a single sample. "Name" = more descriptive unique sample name. "Plant_ID" = unique identifier for individual plants; may correspond to >1 SampleIDs when both roots and leaves were sampled from the same plant; "Experiment" = sub-experiment that a sample belongs to (e.g., "ecotypes" = main field experiment described in Wagner et al. 2016; "ecoGH" = greenhouse experiment described in supplementary information of Wagner et al. 2016; "rilGH" and "fieldBCMA" are not presented in Wagner et al. 2016); "Cohort" = year the individual (Plant_ID) was planted, if applicable; "Harvested" = year sample was harvested; "Treatment" = type of soil the individual was planted in (only applicable to greenhouse experiments); "oldPlate" = identifier of the 96-well plate containing the sample before full randomization; "newPlate" = MiSeq run, i.e., identifier of the amplicon library into which the sample was pooled, i.e., the 96-well plate containing the sample after full randomization; "Analysis" = broadest experimental grouping of samples; either "Ecotypes" (which contains samples described in Wagner et al. 2016) or "BCMA" (containing samples not analyzed or described in Wagner et al. 2016).
SMD.txt
OTU table (97%)
OTU table (for OTUs picked based on 97% 16S rRNA gene sequence identity) that forms the basis for most analyses in Wagner et al. 2016. Generated from raw MiSeq reads (see "Bioinformatics pipeline"). Columns are sample names (correspond to "SampleID" column in "Sample metadata" file); rows are OTUs. Decompress file using the Linux/UNIX command ' bunzip2 otuTable97.txt.bz2 '
otuTable97.txt.bz2
Contaminant OTUs
List of OTUs identified as likely contaminants based on comparison to list of known contaminant 16S rRNA gene sequences (generated using custom scripts; see "Bioinformatics pipeline")
contaminants.fasta
OTU taxonomic assignments (97%)
Taxonomic assignments of OTUs (binned at 97% 16S rRNA gene sequence identity), generated by comparing representative OTU sequences to the Greengenes database using the RDP classifier in Qiime (see "Bioinformatics pipeline").
taxAssignments97.txt
Taxonomic assignments for OTUs binned at 99%
Taxonomic assignments of OTUs (binned at 99% 16S rRNA gene sequence identity), generated by comparing representative OTU sequences to the Greengenes database using the RDP classifier in Qiime. * Note that these are OTUs based on 99% identity, NOT the OTUs that were used for most analyses in Wagner et al. 2016 *
taxAssignments99.txt
Representative OTU 16S rRNA gene sequences
Representative sequence for each OTU (based on 97% sequence identity), in FASTA format. Generated from raw MiSeq reads using FLASH and UPARSE software (see "Bioinformatics pipeline")
OTUrepSeqs97.fa
OTU table (binned at 99%)
OTU table (for OTUs picked based on 99% 16S rRNA gene sequence identity) that forms the basis for most analyses in Wagner et al. 2016. Generated from raw MiSeq reads. Columns are sample names (correspond to "SampleID" column in "Sample metadata" file); rows are OTUs. Decompress file using the Linux/UNIX command ' bunzip2 otuTable99.txt.bz2 ' ** note that most analyses in this paper used OTUs picked based on 97% sequence similarity-- these are in a different file (otuTable97.txt) **
otuTable99.txt.bz2
Key to Plant IDs
Master list of all individuals originally planted in the field experiment- columns as in the Sample Metadata file (SMD.txt)
plant_key.txt
R scripts for data analysis
Archive of R scripts used to produce results in Wagner et al. 2016. See README file for notes on contents and recommendations for use.
R_code_Wagner_etal_2016.tar
Bioinformatics pipeline
A description of the bioinformatics pipeline used to process raw MiSeq data, including parameter values and sample syntax.
bioinformatics_pipeline_Wagner_etal_2016.txt