Skip to main content

Individual variation in the avian gut microbiota: The influence of host state and environmental heterogeneity


Somers, Shane et al. (2023), Individual variation in the avian gut microbiota: The influence of host state and environmental heterogeneity, Dryad, Dataset,


The gut microbiome has important consequences for fitness, yet the complex, interactive nature of ecological factors that influence the gut microbiome has scarcely been investigated in natural populations. We sampled the gut microbiota of wild great tits (Parus major) at different life stages and across multiple conifer and mixed woodland fragments, allowing us to evaluate multiple factors that relate to within-individual gut microbiota acquisition, including habitat type, nest position and life history traits. The gut microbiota varied with both environment and life-history in ways that were largely dependent on age. Notably, it was the nestling, as opposed to the adult gut, microbiota that was most sensitive to ecological variation, pointing to a high degree of developmental plasticity. Individual nestling differences in gut microbiota were consistently different (repeatable) from one to two weeks of life, driven entirely by the effect of sharing the same nest. Our findings point to important early developmental windows in which the gut microbiota are most sensitive to environmental variation and suggest reproductive timing, and hence parental quality or food availability, interact with the microbiome.


Field monitoring and microbiota sampling

Birds were sampled from nine small woodland fragments across Co. Cork, Ireland, five of which were mixed/deciduous and four coniferous woodlands (see O’Shea et al. (2018)). We collected 262 faecal samples from 204 great tits from 63 nests (see table S1 for final sample sizes post bioinformatic processing) for 16S rRNA gene sequencing. Nest boxes were monitored during April–June 2016 to determine lay dates, hatching dates and nestling survival. Typically, all individuals in a nest were sampled except for cases where nestlings did not produce a sample within 10–15 minutes of being placed in the sampling apparatus and risked becoming chilled or weak. Individual nestlings were sampled when they were 8 days old (+/- 1 day), and again, if they survived, when they were 15 days old (D8 and D15 birds respectively), at which point parents were also sampled. Birds were placed individually into sterile holding bags inside a heated holding case and naturally-produced faecal samples were collected. Urea has the potential to affect downstream sequencing and was removed through absorption by coffee filters placed as lining in the sampling bags (Khan et al., 1991). The faecal matter was collected within 15–20 minutes of placing birds in the sampling bag, after which birds were returned to the nest. Faecal sacks were ruptured immediately using a sterile inoculation loop and placed in a microcentrifuge tube containing 500uL of 100% ethanol. Samples were stored at -20°C within 8 hours of collection until DNA extraction. D8 nestlings were weighed, and individually identified by clipping the tip of one of their nails, taking care to avoid the blood vessel. Nestling birds were again weighed at D15 and ringed with a unique identifiable metal ring (British Trust for Ornithology). Samples from nestlings for which we had multiple samples (i.e. for both D8 and Day-15, n = 41) are referred to as ‘repeat samples’. Both parents were trapped on the nest where possible and, if not already ringed, were fitted with a British Trust for Ornithology ring, weighed, and aged as either ‘immature’ (first year breeding) or ‘mature’ (second year/+ breeding) and sexed using plumage indicators.

DNA Extractions

DNA was extracted from the dried faecal contents of all birds using the Qiagen QIAamp DNA Stool Kit, following the ‘Isolation of DNA from Stool for Pathogen Detection’ protocol (June 2012 edition), with modifications described in Shutt et al. (2020) to accommodate dried avian faeces. DNA was stored at -20°C. Full extraction methods are described in the Supporting Information of Davidson et al. (2021).

Illumina MiSeq sequencing

Full library preparation details are described in the Supporting Information of Davidson et al. (2021). Briefly, the V3-V4 variable region of the 16S rRNA gene was amplified from the DNA extracts using the 16S metagenomic sequencing library protocol (Illumina). The DNA was amplified with primers specific to the V3-V4 region of the 16S rRNA gene which also incorporates the Illumina overhang adaptor. Samples were sequenced on the MiSeq sequencing platform (Clinical Microbiomics, Denmark), using a 2 x 300 cycle kit, following standard Illumina sequencing protocols. Samples were randomized across plates to avoid systematic bias resulting from batch effects. Two negative controls were carried through to sequencing. These samples were subsequently excluded from any analyses following bioinformatic processing. These negative controls indicated there was a small number of potentially contaminant ASV’s (<0.01% of ASV’s) detected by the decontam package (Davis et al., 2018). Therefore, our sequence data may contain some ASV’s from the external environment though this should not have systematically biased our results.


The DADA2 pipeline was used to process the raw sequencing data (Callahan et al., 2016) in R version 3.5 (R Core Team, 2019). Sequence quality was visually inspected. Sequences were trimmed to remove adapters and lower quality reads (median quality scores below 25-30 threshold) at the extremities of the sequence and filtered to remove sequences with expected errors above 1. Read errors were estimated before dereplication. Forward and reverse reads were merged to construct 'contig' sequences, which were used to construct a sequence table of Amplicon Sequence Variants (ASV's), which in turn counts the number of times each unique sequence is detected. The previous steps were performed for each run separately. Then the separate sequence tables were merged and chimeras removed using the 'consensus' method. Taxonomy was assigned to each ASV by RDP's Naive Bayes Classifier (Wang et al., 2007) against the Silva reference database (version 132)  (Quast et al., 2012).

The DADA2 outputs were assembled into a single Phyloseq object (McMurdie & Holmes, 2013). Sequences identified as mitochondrial or chloroplast were removed. Sample completeness curves were plotted using vegan (Oksanen et al., 2019) and helped determine the lower cut-off for sample reads at 10,000 reads. Low read samples (<10,000 reads, 11 samples) were removed leaving 195 (adult=51, Day-8=81, Day-15=114) samples for the analysis. Alpha diversity (both Shannon and Chao1 diversity) was calculated using the ‘estimate_richness’ function from the phyloseq package on the filtered dataset.

After removing low read samples (<10,000 reads, n = 11), chloroplast sequences and mitochondria sequences, there were 18,890,006 total reads clustered into 54,343 ASV’s in 246 samples (see table S1 for sample breakdown). Reads ranged from 10,220 to 557,336, with a mean of 76,789 reads per sample. For the analyses presented here, the reads were not rarefied prior to alpha diversity calculation or relative abundance analyses (McMurdie & Holmes, 2014; Willis, 2019). In case alpha diversity estimates were affected by library size, the alpha diversity analyses were rerun with a dataset rarefied to 10220 reads (the minimum library size after removing small libraries) and there was no change in the results.


Irish Research Council