The genus Bombus (bumble bees) includes approximately 265 species, many of which are in decline in North America and Europe. To estimate the colony abundance of bumble bees in natural and agricultural habitats, sibship relationships are often reconstructed from genetic data with the assumption that colonies have one monandrous queen. However, some species such as the North American common eastern bumble bee (Bombus impatiens Cresson) can display low levels of polyandry, which may bias estimates of colony abundance based on monandrous sibship reconstructions. To accurately quantify rates of polyandry in wild and commercially mated queens of this species, we empirically estimated mating frequencies using a novel statistical model and genotypes from 730 bees. To genotype individuals, we used a highly polymorphic set of microsatellites on colonies established from 20 wild-caught gynes and 10 commercial colonies. We found multiple fathers in three of the wild colonies and three of the commercial colonies. This resulted in average effective mating frequencies of 1.075 ± 0.18 and 1.154 ± 0.25 for wild and commercial colonies, respectively. These findings agree with previous reports of low rates of polyandry for B. impatiens. Using a large empirical dataset, we demonstrate that assuming monandry for colony abundance estimation in species that violate this assumption results in an overestimation of the number of colonies. Our results emphasize the importance of studying mating frequencies in social species of conservation concern and economic importance for the accuracy of colony abundance estimation and for understanding their ecology and sociobiology.
Specimen Collection and Rearing
Wild B. impatiens gynes were collected between April and May of 2018 from three locations in Pennsylvania, USA: University Park, Newport, and Landisville. Twenty colonies established from these wild-caught gynes were then reared following standard protocols (Treanore et al. 2021). Colonies were held in a walk-in incubator under constant darkness, at 28–30 ºC, 60% RH, and supplied ad libitum with a 60% sucrose solution and honey bee-collected pollen. Colony maintenance and offspring collection were performed under red light. To quantify mating frequencies in wild colonies, we collected the first 10 emerging workers, and if possible, 10 additional workers approximately 3-4 weeks after to capture any differences in paternity in the later emerging workers. The queen was removed and frozen along with the rest of the workers after collections of the second cohort. Additional bees with unknown emergence dates were also collected for a total of 13-32 bees per wild colony. For the assessment of mating frequencies in managed colonies, ten commercial colonies were purchased from Koppert Biological Systems (Howell Michigan, USA). Commercial colonies were purchased and left in the field until late summer when the entire colony was removed from the field and frozen. The original queen and 14-21 workers per colony were removed for microsatellite genotyping analysis.
Quality Assessment of Microsatellite Loci Population Genetic Analysis
We used 11 microsatellite markers previously developed for B. impatiens and other Bombus spp (Supp 2). Because the family relationships within our dataset violate assumptions of Hardy-Weinberg equilibrium, we used the genetic data published by McGrady et al (2021) to estimate parameters of allele frequencies of the source population for the 11 loci included in this study. We calculated the number of alleles per locus and evaluated the probability of linkage disequilibrium using the package poppr (v2.9.3, Kamvar, Tabima and Grünwald, 2014) in R Statistical Software (v4.1.2; R Core Team 2021). We performed null allele analysis according to Chakraborty et al. (1994) using the package PopGenReport (v3.0.7, Adamack and Gruber 2014). To evaluate deviations from Hardy-Weinberg Equilibrium, we performed a test with chi-squared comparisons using the R package pegas (v1.1, Paradis, 2010). Given that we included 11 loci for all tests, we used the corresponding Bonferroni adjusted α value of 0.0045 for Hardy Weinberg analysis and 0.0009 for linkage disequilibrium to correct for multiple comparisons.
Microsatellite Genotyping
After pinning, the right mesothoracic leg from each bee was removed and placed in a 96-well PCR plate for DNA extraction. A total of 150 μl Chelex® 100 (5%, in milli-q H20) and 5 μl of 10 mg/mL Proteinase K were added to each well of the plate containing a leg. Each sample was then heated in a Mastercycler® pro thermocycler for 60 minutes at 55 °C, 15 minutes at 99 °C, 1 minute at 37 °C, and 15 minutes at 99 °C. Extracted DNA was stored for up to 3 weeks in a 4°C refrigerator. We amplified 11 microsatellite loci with different concentrations optimized for multiplex amplification (Supp. 2). Polymerase chain reactions were performed using a Mastercycler® pro thermocycler with the following cycle settings: 3 min 30 s initial denaturation at 95 °C, followed by 30 cycles of 30 s at 95 ºC, 1 m 15 s at an annealing temperature of 55 ºC and 45 s at 72 ºC, then a final 15-minute extension at 72 ºC with a hold temperature at 15 ̊C. Following amplification, PCR products were diluted 1:10 with molecular water, then sized on an ABI3730XL® DNA Analyzer (Applied Biosystems) with GeneScan LIZ 500 internal size standard (Applied Biosystems) at the Penn State Genomics Core Facility.
Paternity Reconstruction and Power Analysis
Fragment results were imported into Geneious Prime v.2022.0.1 (Biomatters Ltd) for genotyping and alleles were scored manually. Dropout and mistyping error rates were estimated jointly with the posterior probability of the number of fathers on a colony-by-colony basis. To jointly estimate paternity and error rates while including known queen genotypes, we developed a Bayesian approach to paternity reconstruction based on the genotyping error model in Wang (2004). Briefly, the model assumes that observed genotypes are perturbed from true genotypes by two classes of errors: dropouts (ɛ1 where a heterozygous genotype appears homozygous) and mistyping (ɛ2 where a given allele appears to be another allele). The likelihood is calculated by summing (integrating) over possible maternal, paternal, and offspring genotypes conditional on a paternity assignment. We used a nonparametric prior (a Dirichlet process) on full sibling groups and developed a Gibbs sampler that targets the joint posterior distribution of error rates and paternity assignments. The model and Gibbs sampler are detailed in Supplementary Methods and are implemented in an R package (https://github.com/nspope/paternity_DP). To assess the accuracy of the method under realistic conditions, we applied it to 100 simulations for each combination of number of fathers (1 to 6) and error rates (0.01, 0.05, and 0.1 for both classes of errors but consistent across loci). Each simulation used parental genotypes that were sampled from the observed maternal allele frequencies from wild colonies; and produced observed genotypes for 20 offspring and a mother. In each simulation, one father was assigned the majority of offspring and each additional father was allocated only one offspring, as this represents the most challenging conditions for distinguishing genotyping errors from distinct paternal genotypes. To determine the impacts of different numbers of loci on the accuracy in jointly estimating paternity and error rates, 100 simulations were repeated with 2, 4, 8, and 16 loci. To simplify the analysis, each locus was assigned a conservative number of 3 alleles at equal frequencies and a fixed error rate of 0.05 for both dropout and mistyping.
To reconstruct the paternity of our focal wild and commercial colonies, we applied our paternity reconstruction method to the empirical genotypes from each colony, after removing bees/loci with >50% missing genotypes, monomorphic loci, and “drifter” bees. These ‘drifters’ are individuals who originated from a different mother colony and likely escaped their original colony and returned to a different colony during colony maintenance procedures. Because the genotypes of the queens were known, the identification of drifters was accomplished by including a second unobserved queen in the model. After paternity reconstruction, we calculated effective mating frequency– defined as where pi is the proportion of each offspring that belongs to father i (Starr 1984) – by averaging over 1,000 Markov chain Monte Carlo (MCMC) samples for each colony. All following statistical tests were computed in R v.4.1.1 (R Core Team 2020).
Colony Abundance Estimation From Sampled Worker Genotypes
To determine the influence of polyandrous or monandrous assumptions when recreating sibships from genetic data, we reanalyzed a previously published dataset of 6,306 B. impatiens workers collected from 30 agricultural field sites in Pennsylvania (USA) (McGrady et al. 2021) that used the same loci and approach for genotyping used in our study. For each site, the genotypes of the workers were imported into COLONY v.2.0.6.5 (Jones and Wang 2010; Wang 2004) to reconstruct sibship relationships among individuals. We analyzed the data twice in COLONY: once with the assumption of female monandry, and once with the assumption of female polyandry, as the software only provides a binary parameter for mating frequency. For monandry, the number of full-sibling groups was recorded as the number of detected colonies, and the number of individuals comprising each colony was recorded. In order to simplify the analysis, we did not reject any full-sibling groups based on COLONY’s statistical probability of accuracy. For polyandry, the estimated number of mothers was recorded as the number of detected colonies to include half-siblings. Additionally, the number of workers comprising each colony was also recorded. Mating frequencies were calculated from the output of COLONY with polyandry to compare it with our empirical estimation of mating frequencies in B. impatiens. A Wilcoxon rank-sum test was used to determine whether mating frequency differed between monandrous and polyandrous colonies. Total colony abundance was estimated in R with the package CAPWIRE (Miller et al. 2005) for both monandrous and polyandrous colonies from each site. A Welch’s paired t-test was performed in R v.4.1.1 (R Core Team 2020) to determine if the number of detected and estimated colonies differed when calculated under assumptions of monandry and polyandry.