Skip to main content

Data from: Microbiomes of a specialist caterpillar are consistent across different habitats but also resemble the local soil microbial communities

Cite this dataset

Gomes, Sofia et al. (2020). Data from: Microbiomes of a specialist caterpillar are consistent across different habitats but also resemble the local soil microbial communities [Dataset]. Dryad.


Background: Insect-associated microorganisms can provide a wide range of benefits to their host, but insect dependency on these microbes varies greatly. The origin and functionality of insect microbiomes is not well understood. Many caterpillars can harbor symbionts in their gut that impact host metabolism, nutrient uptake and pathogen protection. Despite our lack of knowledge on the ecological factors driving microbiome assemblages of wild caterpillars, they seem to be highly variable and influenced by diet and environment. Several recent studies have shown that shoot-feeding caterpillars acquire part of their microbiome from the soil. Here, we examine microbiomes of a monophagous caterpillar (Tyria jacobaeae) collected from its natural host plant (Jacobaeae vulgaris) growing in three different environments: coastal dunes, natural inland grasslands and riverine grasslands, and compare the bacterial communities of the wild caterpillars to those of soil samples collected from underneath each of the host plants from which the caterpillars were collected.

Results: The microbiomes of the caterpillars were dominated by Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes. Only 5% of the total bacterial diversity represented 86.2% of the total caterpillar’s microbiome. Interestingly, we found a high consistency of dominant bacteria within the family Burkholderiaceae in all caterpillar samples across the three habitats. There was one amplicon sequence variant belonging to the genus Ralstonia that represented on average 53% of total community composition across all caterpillars. On average, one quarter of the caterpillar microbiome was shared with the soil.

Conclusions: We found that the monophagous caterpillars collected from fields located more than 100 kilometers apart were all dominated by a single Ralstonia. The remainder of the bacterial communities that were present resembled the local microbial communities in the soil in which the host plant was growing. Our findings provide an example of a caterpillar that has just a few key associated bacteria, but that also contains a community of low abundant bacteria characteristic of soil communities.


Study sites and sample collection

We selected three characteristic habitats in The Netherlands in which ragwort plants and cinnabar moths occur in their native range: coastal dunes (Meijendel), inland natural grasslands (Veluwe) and riverine grasslands (Wageningen) and sampled in three localities within each habitat. At each locality, ten Jacobaeae vulgaris plants on which Tyria jacobaeae caterpillars were feeding were selected (SI Figure S1). Plants were located at a distance of at least 10 m to each other (except for two sampling localities in riverine grasslands where the smallest distance between two plants was 8.3 and 9.2 m). Around the stem of each plant, five soil samples from the top 5 cm layer were taken using a 20 cm soil borer with 2 cm diameter and pooled together. Caterpillars collected from each individual plant were kept together (with a minimum of 3 and a maximum of 10 caterpillars collected per plant). All samples were stored in a cooler with ice until processing in the laboratory on the same day. Fresh weight of each individual caterpillar was recorded (SI Figure S2; SI Table S1). All caterpillars were surface sterilized by dipping them for 30 seconds in the following solutions: 70% ethanol, 2.0% bleach, and then rinsed twice with autoclaved demineralized water. Caterpillars were surface sterilized to enrich the samples for gut rather than surface microbes while leaving gut microbes intact [20]. Both caterpillar and soil samples were stored at -20 °C until further processing. Caterpillar samples were lyophilized prior to DNA extractions. 

DNA extraction and library preparation

To obtain a representative sample of caterpillars feeding on one plant, DNA was extracted collectively from three homogenized caterpillars, from approximately 10 mg of dry, lyophilized sample. In total, 270 caterpillars were used in this study (9 study locations x 10 plants x 3 caterpillars per sample). Extractions were performed using the MP Biomedicals FastDNA™ Spin Kit (MP Biomedicals, Solon, Ohio, USA) following the manufacturer’s protocol with the following modifications. Samples homogenized with Cell Lysis Solution in the FastPrep® Instrument (MP Biomedicals) for 20 seconds (speed setting of 6.0) were incubated at room temperature for 1 hour. An extra washing step was included, and the final eluted DNA was additionally precipitated prior further purification using standard ethanol precipitation method with potassium acetate.

From approximately 0.35 g of soil samples, DNA was extracted from 90 samples using DNeasy PowerSoil Kit DNA (Qiagen, Hilden, Germany) following the manufacturer’s protocol.

Approximately 10 ng of template DNA was used for PCR using primers 515FB [21] and 806RB [22] targeting the V4 habitat of the 16S rRNA gene [23]. The PCR mixture (25 µl) contained 12.5 µl Phusion Flash High Fidelity PCR Master Mix (Thermo Scientific), 1.25 µl of each of the primers (10 µM). The conditions were 45 s at 98 °C, followed by 30 cycles for caterpillars and 25 for soil samples of 98 °C for 5 s, 55 °C for 5 s, and 72 °C for 10 s with a final extension of 1 min at 72 °C. The PCR products were purified using Agencourt AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA). Adapters and barcodes were added to samples using Nextera XT DNA library preparation kit sets A-B (Illumina, San Diego, CA, USA). The final PCR product was purified again with AMPure beads, verified using agarose gel electrophoresis and quantified with a Nanodrop spectrophotometer (Thermo Scientific, Hudson, NH, USA) before equimolar pooling. Separate libraries were prepared for bacterial communities derived from caterpillar and soil samples (96 samples per library) including extraction negatives. Libraries were sequenced at McGill University and Genome Quebec Innovation Center.

Sequence processing

Raw reads were processed into ASVs (amplicon sequence variants) using the DADA2 pipeline [24], and taxonomic identification was performed by querying against the SILVA database with SINA classifier [25]. Reads that could not be assigned to Bacteria (i.e. Archaea, Eukaryotes, mitochondria, chloroplast, and unidentified) and all the bacterial ASVs present in the extraction negatives were excluded. The percentage of mitochondria and chloroplast reads was variable between samples, as it is often reported in insect gut microbiome studies [26], but did not significantly differ between localities or regions (SI Figure S2). Caterpillar and soil datasets were filtered separately due to their inherent differences in bacterial diversity. Only samples with sequencing depth ranging from five times more to five times less than the mean sequencing depth of caterpillar or soil samples were kept for further analysis. To explore general patterns of bacterial diversity among samples, the caterpillar and soil datasets were resampled to the lowest sequencing depth of 3,849 and 13,731 reads, respectively. This resulted in removing three caterpillar samples (one from each habitat); while all soil samples were kept. The combined rarefied datasets resulted in 1,570,653 reads assigned to 41,089 bacterial ASVs, of which 11,685 ASVs were present in high abundance (>0.1% relative abundance).

Microbiome analysis

Alpha diversity for caterpillar and soil was assessed on the rarefied datasets by calculating the estimated species richness, Shannon diversity (the exponential of Shannon entropy), and Simpson diversity (the inverse Simpson concentration) [27], using the iNEXT R package [28]. These indices are based on sample-size interpolation and extrapolation sampling curves [29], and represent the diversity estimates for rarefied and extrapolated samples with respect to the number of samples within localities. The estimated diversity was compared between habitats using the nonparametric Wilcoxon test with pairwise adjusted (Holm) p-values. Because caterpillars in the natural populations varied in size (see SI Table S1 and Figure S3 for average weights), to assess potential associations between the fresh biomass of caterpillars and their microbial species diversity or community composition, we used the Spearman rank correlation coefficient, and multivariate GLMs (see method description below), respectively. Moreover, to test whether spatial distribution of the sampled localities had an effect on the structure of bacterial communities in caterpillars, we computed a Mantel test correlation between the distance among localities in kilometers, calculated with the geosphere R package [30] on the geographic coordinates, and the Bray-Curtis dissimilarity matrix bacterial composition of caterpillars. The rarefied relative abundance data were squared-root transformed and Wisconsin double-standardized before the calculation of the Bray-Curtis dissimilarity matrix using the vegan R package [31]. This standardization scales the variability of different samples to each other.

To display differences in microbiome composition of caterpillar and soil samples among localities within the three habitats, we generated Principal Coordinates Analysis (PCoA) ordination plots using the ape R package [32], using the squared-root transformed and Wisconsin double-standardized data. To reduce the influence of taxa present in few samples to the overall community composition, only ASVs present in at least three caterpillar or soil samples with a relative abundance higher than 0.1% in each dataset were included in the PCoA. The homogeneity of dispersion of the microbiome data was tested using betadisper function in vegan R package, and for both caterpillars and soil in the nine localities within the three habitats, data were overdispersed. We recognize that distance-based approaches can perform poorly when the data presents strong mean-variance relationships [33], therefore we used it only as a means to visualize differences in microbiomes.

To assess caterpillar and soil microbiome community differences between localities and within habitats, we used a multivariate model-based approach [34], using the mvabund R package [35]. The microbiome structure of caterpillars or soil within locations or habitats was tested in separate models. We used a GLM with negative binomial distribution and a log-link function to account for the overdispersion of the data. In the multivariate GLM, a model was fit to each ASV and the log-likelihood ratio (LR) of each model was summed to create an overall sum-of-LR [34]. Examination of the residual plots from the models showed no clear patterns indicating that the negative binomial GLM model was appropriate. Significance of the models was evaluated using 999 resampling iterations with PIT-trap resampling [36]. In addition, a linear discriminant analysis coupled with effect size measurements (LEfSe) based on Wilcoxon sum-rank test [37] was used to screen for differentially abundant bacteria at any taxonomic level among the three habitats.



Dutch Research Council, Award: 865.14.006