Supplementary code and data for: Symbiotic bacteria and fungi proliferate during diapause and may enhance overwintering survival in a solitary bee
Data files
May 23, 2024 version files 2.62 GB
-
16Srawfiles.zip
801.14 MB
-
ITSrawfiles.zip
506.88 MB
-
README.md
12.58 KB
-
Supplemental_code_and_data.zip
1.31 GB
Abstract
Host-microbe interactions underlie the development and fitness of many macroorganisms, including bees. Whereas many social bees benefit from vertically transmitted gut bacteria, current data suggests that solitary bees, which comprise the vast majority of species diversity within bees, lack a specialized gut microbiome. Here we examine the composition and abundance of bacteria and fungi throughout the complete life cycle of the ground-nesting solitary bee Anthophora bomboides standfordiana. In contrast to expectations, immature bee stages maintain a distinct core microbiome consisting of Actinobacterial genera (Streptomyces, Nocardiodes) and the fungus Moniliella spathulata. Dormant (diapausing) larval bees hosted the most abundant and distinctive bacteria and fungi, attaining 33 and 52 times their initial copy number, respectively. We tested two adaptive hypotheses regarding microbial functions for diapausing bees. First, using isolated bacteria and fungi, we found that Streptomyces from brood cells inhibited the growth of multiple pathogenic filamentous fungi, suggesting a role in pathogen protection during overwintering, when bees face high pathogen pressure. Second, sugar alcohol composition changed in tandem with major changes in fungal abundance, suggesting links with bee cold tolerance or overwintering biology. We find that Anthophora bomboides hosts a conserved core microbiome that may provide key fitness advantages through larval development and diapause, which raises the question of how this microbiome is maintained and faithfully transmitted between generations. Our results suggest that focus on microbiomes of mature or active insect developmental stages may overlook stage-specific symbionts and microbial fitness contributions during host dormancy.
https://doi.org/10.5061/dryad.gtht76ht1
This dataset contains data and code in three files: "Supplemental code and data.zip" contains all of the files and folders listed below. "16Srawfiles.zip" and "ITSrawfiles.zip" contain the raw bacterial and fungal data for use in the "Amplicon" subfolder once you open the "Supplemental code and data.zip" (as listed below).
Raw Illumina Amplicon Data
"16Srawfiles.zip" and "ITSrawfiles.zip" unzip to folders which each contain the raw forward and reverse ".fastq.gz" files from the Illumina sequencing of each sample for bacteria (16S) and fungi (ITS), respectively. The file names correspond to sample number (ex. "Sample15" or "S140") and forward ("R1") or reverse ("R2") reads. This raw data is used in the Amplicon R code below.
This data was obtained at the Integrated Microbiome Resource (IMR) at Dalhousie University with the protocol described in Comeau and Kwawukume (2023 ; dx.doi.org/10.17504/protocols.io.4r3l277k3g1y/v1) . Bacterial primers 799F and 1115R (799F= 5'-AACMGGATTAGATACCCKG-3'/ 1115R= 5'-AGGGTTGCGCTCGTTG-3') were used to amplify and sequence the ~300bp V5/V6 region of the bacterial 16S rRNA gene, these were selected to reduce plasmid amplification. Fungal primers ITS1F and ITS2 (ITS1F= 5'-CTTGGTCATTTAGAGGAAGTAA-3'/ ITS2=5'-GCTGCGTTCTTCATCGATGC-3') were used to amplify and sequence the variable length Internal Transcribed Spacer region.
Code and Other Data
The "Supplemental code and data.zip" has data and code in these categories: Illumina amplicon sequencing of bacteria and fungi from solitary bee (Anthophora bomboides) stages throughout the life cycle and environmental samples, qPCR (of the same samples), Inhibition trials examining fungal pathogen growth inhibition by brood-cell isolated Streptomyces, and sugar/sugar alcohol (SSA) HPLC analysis of bee developmental stages from just prior to just after diapause. This is the code and data used for the analysis and creation of figures in the aforementioned manuscript, and further details on methods can be found in the methods section of the manuscript and in the supplemental methods document of the manuscript.
The dataset is broken down into folders based on the four major experiments/ analyses: Illumina amplicon sequencing (Amplicon), qPCR (qPCR), inhibition trials (Inhibition), and sugar/ sugar alcohol analysis (SSA). Each is described below.
Amplicon
- ‘Amplicon’: Contains "Bacteria" and "Fungi" subfolders representing the analysis of bacterial (V5V6 16S rRNA gene) and fungal (ITS gene) Illumina sequencing, and one .CSV file of metadata
- 'sample_metadata.CSV' - with columns:
- Sample name- Unique sample number
- match- The sample number/name as it appears in the raw data file names to each sample to its metadata
- site- The site from which the sample was collected: either Bodega Head or McClure’s Beach - both on the coast of California
- date.collected- The date on which the sample was collected. Samples were places into coolers for transport back to lab then kept at -80 deg. C until processing.
- Category- Broad category of sample, options being “bee” (any bee sample), “environmental” (water, dirt, or flowers), or “blank” (DNA extraction blanks)
- Type- The finest classification of the samples, by stage or source. Stages: Egg, First-second instar, Third instar, Fourth instar, Summer prepupa, Oct. prepupa, Dec. prepupa, Pupa, Unmerged adult, Adult crop, Adult gut,; Sources: Water, Soil, extraction blank, Lupine, Radish, Ice plant, Poppy, Sea rocket, Gum plant, Seaside daisy.
- Type1- Broader classification (one step down from Category)- bee samples now divided into: in brood cell, out of brood cell; environmental divided into: water, dirt, flower; blanks; blank
- Type2- Braoder classification (step up from Type) grouping egg-2nd instar (egg-early), 3rd-4th provisions (mid-late) & larvae (mid.LARVA), Summer, October, December, pupa, unemerged, and adult are the same as Type, all flower samples are ‘flower’.
- Type3- Separates broad categories of the actual substance of the sample, eg- pollen provision, larva, prepupa, pupa, unemerged, adult, water, dirt, flower, blank.
- stage.number- numbers for ordering the various stages, from 1 to 9 for bees, and non-bee samples arbitrarily as ’10’
- Seq_set- numbers 1 or 2 for whether the sample was sequenced in the first or second set of samples submitted to IMR at Dalhousie
- Notes- notes on the sample, if applicable
- 'Bacteria' contains subfolders:
- 'code' which holds three .R scripts of the DADA2 pipeline and analysis:
- 'code1_V5V6_Anthophora_DADA2.R' - This code goes through the initial DADA2 pipeline for the bacterial raw reads (which are found in '16Srawfiles.zip'). Further instructions on how to proceed are in the code itself at the top.
- 'code2_V5V6_Anthophora_CleanupPS.R' - This code goes through the creation of the phyloseq (PS) object and cleanup, resulting inthe final phyloseq object containing the samples and reads to be analyzed.
- 'code3_V5V6_Anthophora_Analysis.R' - This code starts with the final phyloseq (PS) object created in code2, but this final object is also supplied as a .RDS file (in the 'RDS' folder, see below) if you just want to look at analysis of the PS object and not go through the entire pipeline. This code goes through all of the analysis, results, statistics, and figures related to amplicon data, as described in the manuscript.
- 'raw' which holds a single .zip file containing all of the raw Illumina amplicon samples/reads : this is a separate zip file as described above, which needs to be opened/ expanded in the 'raw' folder for acceptance by 'code1'.
- 'RDS' which contains 6 .RDS files (files that save whole R objects) and one .CSV
- '16S_unfilt_phyloseq_obj.RDS' - the phyloseq object prior to filtering
- 'BC_dist_for_psfd_ord.RDS' - Bray Curtis distance matrix of the final phyloseq object for ordination
- 'Bee_subset_bact.RDS' - final phyloseq object subset to only bee samples (this is also used for qPCR)
- 'psf_decontam_300plusreads.RDS' - FINAL PS object after filtering, decontam, and removing samples with fewer than 300 reads. Can access at the beginning of code 3 with readRDS('filelocation')
- 'psfd_ord_broodcell.RDS' - Final PS object, subset to just the broodcell samples.
- 'psfd_ord_nmds_broodcell.RDS' - Ordination of brood cell samples by stage
- '16S_final_track_reads.csv'- CSV version of read tracking, for use in qPCR code for mitochondria removal
- 'code' which holds three .R scripts of the DADA2 pipeline and analysis:
- 'Fungi' contains subfolders:
- 'code' which holds three .R scripts of the DADA2 pipeline and analysis:
- 'code1_ITS_Anthophora_DADA2.R' - This code goes through the initial DADA2 pipeline for the fungal raw reads (which are found in 'ITSrawfiles.zip'). Further instructions on how to proceed are in the code itself at the top.
- 'code2_ITS_Anthophora_CleanupPS.R' - This code goes through the creation of the phyloseq (PS) object and cleanup, resulting in the final phyloseq object containing the samples and reads to be analyzed.
- 'code3_ITS_Anthophora_Analysis.R' - This code starts with the final phyloseq (PS) object created in code2, but this final object is also supplied as a .RDS file (in the 'RDS' folder, see below) if you just want to look at analysis of the PS object and not go through the entire pipeline. This code goes through all of the analysis, results, statistics, and figures related to amplicon data, as described in the manuscript.
- 'raw' which holds a single .zip file containing all of the raw Illumina amplicon samples/reads : this is a separate zip file as described above, which needs to be opened/ expanded in the 'raw' folder for acceptance by 'code1'.
- 'RDS' which contains one .RDS file (files that save whole R objects) and one .CSV
- 'ITS_final_track_reads.CSV' - shows read tracking over dada2 pipeline
- 'post_decontam_ps_obj.RDS' - FINAL ITS PS object after filtering, decontam, and removing samples with fewer than 300 reads. Can access at the beginning of code 3 with readRDS('filelocation')
- 'code' which holds three .R scripts of the DADA2 pipeline and analysis:
- 'sample_metadata.CSV' - with columns:
qPCR
- 'qPCR' contains three files: 2 .CSV files and one .R script
- 'raw_bacterial_qPCR.csv'- 5 columns:
- 1st is just row numbers; then Well (indicating the well location on the qPCR plate with row indicated by letter and number indicating column); Sample (the sample number, as corresponds to the metadata); Cq (the Cq as reported as raw value from the qPCR run); plate (which plate the sample was run on)
- 'raw_fungal_qPCR.csv' - the same as above but for the fungal qPCR data, and will another column in the 4th position “content” which indicates whether the sample is ‘unknown’ (‘Unkn’), positive control (‘Pos Ctrl’), or blank (‘NTC’).
- 'qPCR_code.R' - contains all of the formatting, analysis, statistics and graphing code for both the bacterial and fungal qPCR data, graphs, and integration with amplicon data. Further instructions and required packages are listed at the top of the file.
- 'raw_bacterial_qPCR.csv'- 5 columns:
Inhibition
- ‘Inhibition’ contains two files, one .CSV and one .R script
- ‘Inhibition_data.csv’ - contains 10 columns’:
- ‘Plate’ is a unique plate identifier number ; ‘Inhibitor.ID’ is the shorthand strain identifier + replicate letter (A,B,C,D,E) for the experimental plates, or ‘neg control’ for the control plates. ; ‘Inhibitor.Spp’ is the genus and strain identifier of the inhibitor species without replicate letter; “Date.inhibitor.plated” is the date that the inhibitor species was put on the plate, or if control , the date associated with when the cohort of plates received the inhibitor; ‘Fungus_ID’ is the shorthand for which fungus was later tested on the plate; ‘Fungal_Spp’ is the full genus and species of the plated fungus; ‘Date_Fungal_plug_added’ when the fungus was then added to the pre-inoculated streptomyces plate (or control)- this is day 1; Radius- the measurement from the edge o f the fungal t plug to the leading edge of the fungal hyphae , measured perpendicular to the inoculated vertical lines of inhibitor; ‘datemeasured’ - when the radius measure was taken; 'Days_past'- days between fungal plug added and radius measured.
- ‘Inhibition_code.R’ - contains all of the formatting, analysis, and graphing code for fungal inhibition by Streptomyces, graphs, and statistics. Required packages are listed at the top of the file.
- ‘Inhibition_data.csv’ - contains 10 columns’:
Sugar and Sugar Alcohols
- ‘SSA’ contains two files, one .CSV and one .R (SSA stands for sugar and sugar alcohols)
- ‘SSA_data.csv’ - 5 columns of metadata, followed by 17 columns of each sugar or sugar alcohol component.
- ‘Vial’ - the position of the sample or standard vial in the autosampler tray; “Type” - whether it was a bee sample “Ab” aka Anthophora bomboides, ‘control’ (nothing), or ‘standard’ (containing known concentration of known SSA); ’Name’ - the unique identifier of each sample, or the name of the known standard added to that vial ; ‘Type2’ the stage of each bee sample ( or nothing/ NA for control/ standards) ; ‘stage_number’ numbers used for ordering the sample types, from 0 to 5; columns 6:22 are names of the components- named by their corresponding known standard or, if unknown, by retention time of associated peak. Numbers in these columns represent the area under the peak of each component as calculated in the Chromeleon software.
- ‘HPLC_SSA_core.R’ - contains all of the formatting, analysis, statistics and graphing code for Sugar and Sugar Alcohol HPLC data and graphs. Required packages are listed at the top of the file.
- ‘SSA_data.csv’ - 5 columns of metadata, followed by 17 columns of each sugar or sugar alcohol component.
Code/Software
Software and primary packages (further packages listed in intro to each '.R' file):
R (4.1.1); DADA2 (1.22.0); phyloseq (1.38.0); vegan (2.6.4); microbiome (1.23.1); ggplot2 (3.4.2)
This dataset includes data and code for the following: amplicon sequencing (1A), qPCR (1B), plate inhibition (2), and sugar/sugar alcohol analysis (3). Software and major package versions: R (4.1.1), DADA2 (1.22.0), phyloseq (1.38.0), vegan (2.6.4), microbiome (1.23.1), and ggplot2 (3.4.2). For furhter method details see manuscript.
- Amplicon data and qPCR are from stages throughout the lifecycle of a solitary bee, as well as environmental samples. All samples were added whole to DNA extraction, following preprocessing. Extraction for all samples was done per manufacturer’s instructions with the DNeasy PowerSoil Pro kit. Four blanks were included in DNA extractions. Extracted DNA was stored in the included extraction buffer at -80oC for amplicon sequencing and qPCR.
- Amplicon sequencing of extracted DNA was done to assess bacterial and fungal community composition using the 16S rRNA (V5/6) gene and ITS gene at the Integrated Microbiome Resource (IMR) at Dalhousie University in Halifax, Nova Scotia. Phusion Plus high-fidelity polymerase was used with fusion primers, which include the sequences below with Illumina adaptors + indices for multiplexing; sequencing was then performed on Illumina MiSeq. Samples were de-multiplexed at IMR. For bacteria, primers 799F/1115R amplifying V5/V6 region of the 16S gene were used to limit mitochondria and chloroplast amplification (799F= 5'-AACMGGATTAGATACCCKG-3'/ 1115R= 5'-AGGGTTGCGCTCGTTG-3'). These primers amplify a ~300bp length target sequence. For fungi, primers ITS1F/ITS2 were used (ITS1F= 5'-CTTGGTCATTTAGAGGAAGTAA-3'/ ITS2=5'-GCTGCGTTCTTCATCGATGC-3'). These primers amplify the variable length ITS1/2 region.
- qPCR: Bacterial copy number was quantified with standard DNA intercalating dye (SYBR) based qPCR. The same extracted samples that were sent for amplicon sequencing were run through this procedure. Identical primers (799F= 5'-AACMGGATTAGATACCCKG-3' /1115R= 5'-AGGGTTGCGCTCGTTG-3') were used so that compositional and quantification could be directly compared and merged. A 1:10 dilution of extracted DNA was determined after dilution testing was done with a representative subset of samples; 1:10 dilution gave in-range Cq values. Master mix, per reaction, was composed of 5ul SSO Advanced Universal SYBR Supermix (Catalog# 1725271), 0.3ul of each primer (10uM), 3.4ul Molecular grade water, and 1ul of extracted DNA (diluted 1:10 in Molecular grade water). Reactions were performed in triplicate for each sample, and arranged semi-randomly across plates to avoid possible correlations of plate and developmental stage. Blanks and standards were included in each plate, and a Cq cutoff for blanks was established at 31. Fungal quantification was done with FungiQuant, using the 18S rRNA gene primers FungiQuant-F= 5′-GGRAAACTCACCAGGTCCAG-3′ and FungiQuant-R = 5′-GSWCTATCCCCAKCACGA-3′, along with the fluorescent probe FungiQuant-Prb = (6FAM) 5′-TGGTGCATGGCCGTT-3′ (MGBNFQ). As with bacteria, dilution testing of samples was done to bring Cq values into the optimal range, and a 1:20 dilution was picked. Master mix, per reaction, was composed of 5ul PCR Biosystems qPCRBIO Probe Mix (No-ROX) (Catalog# 17-512B), 0.3ul of each primer (10uM), 0.3ul fluorescent probe (10uM), 3.1ul molecular grade water, and 1ul of extracted DNA (diluted 1:20 in molecular grade water). Reactions were performed in triplicate for each sample, and arranged semi-randomly across plates to avoid possible correlations of plate and developmental stage. Blanks and standards were included in each plate.
- Plate Inhibition: For all inhibition trials we used TSA media without any antimicrobials. For consistency, we used a template to mark the underside of all of the plates, it included in the center a cross “+” then two parallel lines, 30mm long and each 20mm from the center point. These served as guides for inoculations. Streptomyces strains were inoculated with 1ul hoops from stock plates of the TSA without antifungals onto the 30mm parallel lines. Five replicate plates were made for each comparison (25 per trial, including 5 control plates). These were allowed to grow for 10 days, then, from stock plates of each fungus (also TSA, no antimicrobials), plugs were inserted into the center “+” of each plate. Care was taken to ensure that plugs were all taken from just inside the leading edge of the fungal hyphae on the stock plates. These were allowed to grow for seven days. Measurements were taken on the backs of the plate, and measured the distance from the leading edge of the growing fungi to the center “+”, directly perpendicular to the parallel lines, on both sides. A tabletop light pad was used for imaging to qualitatively assess the density of the fungal hyphae, ensuring even back-lighting for the plates. Radius measurements (two per plate, each side of the ‘+’) were averaged for each replicate plate.
- Sugar/ Sugar Alcohol Profiles: Samples of whole larvae, prepupae and pupae, as well as one pollen provision from a 4th instar larva were extracted for sugar and sugar alcohol analysis. Whole samples were placed in tubes with metal beads and 1mL of 100% ethanol and run on a bead beater for 8 minutes at full speed with 20s breaks every minute. These were then centrifuged for 30 seconds at 10k rcf. For each sample, the top 700ul of ethanol was moved to a new tube, 700ul 100% hexane was added, and then vortexed for 30 seconds. To this, 100ul MilliQ water was added, and vortexed for another 30 seconds. Once hexane had separated from the aqueous phase, it was removed (800ul). The remaining 1mL of aqueous phase was centrifuged for 2 minutes at 16k rcf, and the bottom 500ul was filtered through a 0.2 micron syringe filter and placed in a new tube in a lyophilizer for 6 hours, without heat. The dried samples were kept in a -20C freezer until analysis, at which time they were re-suspended in 300ul 1:1 water: acetonitrile. Standards of erythritol, sorbitol, fructose, glucose, sucrose, xylose and maltose were made at 0.5 mg/mL, standards of glycerol and trehalose were made at 5mg/mL and 1mg/mL respectively, all in 1:1 water: acetonitrile. Separation of sugars was performed on Thermo UltiMate 3000 HPLC system according to the Waters Application Note: WA60110, except for the following: column was Phenomenex Luna Omega 3um SUGAR (50x2.1mm, Part#: 00B-4775-AN), and flow rate was 0.2mL/min; detection was by CAD (Corona Veo; Dionex). Each sample was run twice, standards were run 2-5 times. Analysis of peaks was performed with Thermo Fisher Chromeleon software. Peak identities were assigned based on retention times of standards, and unassigned peaks were then named by their retention times. Peak area was calculated by the software.
