Genetic Distances between badgers (based on microsatellite-derived estimates of relatedness) and M. bovis isolates (based on pairwise SNP distances) alongside categories describing social group membership, age and sex categories
Data files
Jun 10, 2025 version files 3.35 MB
Abstract
Pathogens rarely mix freely throughout host populations, and the presence of barriers to transmission can be detected as patterns of increased genetic isolation among pathogen isolates. Despite the importance of transmission patterns in host societies and the risk of epizootics from wildlife disease systems, barriers to open pathogen transmission are poorly understood in wild hosts. We tested the influence of host kinship and social structure on genetic divergence among strains of Mycobacterium bovis, the causative agent of bovine tuberculosis, in a wild badger population. We measured genetic distances between M. bovis isolates from badger hosts that varied in their own genetic similarity (a proxy for kinship) and in their social group affiliations. Using jack-knifing analyses to control for pseudoreplication, we found that genetic distances between pathogen isolates decreased with increasing kinship of host dyads, but only when hosts shared the same social group. Our findings suggest that the open transmission of bTB in wild hosts is constrained by a combination of social and kin structure, in particular the sharing of similar pathogen strains among kin within social groups. We discuss the implications of these transmission structures for the understanding and management of wildlife disease.
https://doi.org/10.5061/dryad.8w9ghx3wq
Description of the data and file structure
We tested the influence of host kinship and social structure on the population genetics of Mycobacterium bovis, the causative agent of bovine tuberculosis, in a wild badger population. Previous evidence suggested that social group boundaries constrain pathogen mixing and that cub infection risk is heightened by increasing prevalence of infected kin within social groups. We measured genetic distances between M. bovis isolates from badger hosts that varied in their own genetic similarity (a proxy for kinship) and in their social group affiliations.
Files and variables
File: Bentonetal_HostPathogenGeneticDistances_2025.csv
Description:
Variables
- Focal.Animal.ID: Unique ID of each badger
- Focal.Animal.Sex: Badger Sex
- Focal.Animal.Age.FC: Badger age at first capture event (adult or cub)
- Focal.Animal.Year.FC: Calendar year when the badger was first caught
- Focal.Animal.Age.at.Sample.Year: Badger age when the corresponding M. bovis sample was collected from it
- Focal.Animal.Current.AgeClass: Current estimated age class of the badger
- Near.Animal.ID: Unique ID of the first badger in the pairwise comparison
- Near.Animal.Sex: Sex of the first badger in the pairwise comparison
- Near.Animal.Age.FC: Age at first capture of the first badger in the pairwise comparison
- Near.Animal.Year.FC: Calendar year at first capture of the first badger in the pairwise comparison
- Near.Animal.Age.at.Sample.Year: Estimated badger age at the year when the sample was collected (first badger in the pairwise comparison)
- Near.Animal.Current.Age.Class: Estimated badger age class at the year when the sample was collected (first badger in the pairwise comparison)
- Sex_Grouping: which sex grouping the pairwise badgers were: male/male, female/male, female/female
- SexCategory: Grouping together between sex comparison; male/female or female/male grouped as ‘between sexes’
- Age_Grouping: which age grouping the pairwise badgers were: adult/cub, etc
- AdultCategory: Grouping together inverse age comparisons: adult/cub and cub/adult grouped as the same category
- AdultCategory_GROUPED: Grouping together all comparisons between young (yearling or cub) adults due to small sample sizes
- Related_Category: Splitting badgers into likely relatives and non-relatives based on Q&G relatedness estimate
- Focal.Sample.ID: ID of the first M. bovis sample in the pairwise comparison
- Near.Sample.ID: ID of the second M. bovis sample in the pairwise comparison
- Isolate.Genetic.Distance: Number of SNPs (single nucleotide polymorphisms) between the compared M. bovis isolates
- Focal.Isolate.Year: Year when the first M. bovis isolate was collected from a clinical badger sample
- Near.Isolate.Year: Year when the second M. bovis isolate was collected from a clinical badger sample
- Time.Diff: Number of years between samples (directional)
- Time.Diff.Without.Direction: Number of years between samples (non-directional)
- GroupMatch: Whether the badgers in the pair were resident in the same social groups
- SumCategory: Grouping of relatedness and same or different group categories
- AllCategory_Sum: Grouping of all demographic categories
- nonkin: categorical measure of badger genetic similarity (nonkin = below average similarity)
- kinship: Estimated relatedness between badgers based on the Queller and Goodnight estimator (derived from microsatellite marker information)
Code/software
Code generated using R version 4.5.0
Packages used: lme4
Access information
NA
Host Genotyping
On first capture of each individual badger, a hair sample was routinely taken (approx. 20-30 hairs plucked from the rump) and stored in 80% ethanol before being submitted for DNA extraction and genotyping (Carpenter, Pope et al. 2005). Full details on genotyping are available elsewhere (Marjamäki, Dugdale et al. 2019); however, in brief, individuals trapped between 1993 and 2002 were genotyped based on the DNA extraction protocols set out in Carpenter, Pope et al. 2005, whilst hair samples from individuals trapped after this period were genotyped at the NERC Biomolecular Analysis Facility (University of Sheffield), as described in Marjamäki et al. 2019. As hair genotyping had taken place in batches, cross-validation protocols were used, including re-genotyping a subset of subsamples; this is described in detail elsewhere (Marjamäki, Dugdale et al. 2019). Genotype data from 108 individual badgers were used in the study (51 males, 57 females from 25 different social groups, trapped between 1993 and 2019 inclusive). Details of the capture histories of the badgers included in this study are shown in Supplementary Figure 1.
We used 20 microsatellite markers to derive genotypes, each with 4-7 alleles (Carpenter, Dawson et al. 2003). Additional information on marker performance and comparison of relatedness estimators is available in the Supplementary Material. Pairwise relatedness between badgers was therefore estimated between all pairs of individuals in the study (n = 108) using the Queller and Goodnight rxy estimator (Queller and Goodnight 1989). The expected rxy relatedness coefficients between individuals based on relationship are as follows: parent/offspring and full sibling 0.5, half sibling 0.25, and unrelated 0 (Gautschi, Jacob et al. 2003). Negative values of rxy may occur if gene frequencies of the two compared individuals differ from the population mean in opposite directions (Queller and Goodnight 1989). The relatedness coefficients allowed the assignment of categorical relationships between pairs of badgers (identifying full and half siblings, etc); however, simulation testing suggested that the power of 20 microsatellite markers to accurately differentiate between relatedness classes was likely to be limited, hence relatedness coefficient was used as a continuous variable in the analyses.
Pathogen Genotyping
Clinical samples routinely collected from anaesthetised badgers during sampling included oesophageal and tracheal aspirates, urine, faeces, and swabs of bite-wounds or abscesses. Supplementary Figure 1 shows the culture sampling histories of the 108 badgers included in this study. These samples were processed and seeded onto media selective for M. bovis. When bacterial growth was observed (this may take 6-12 weeks), a single colony was inoculated in HPLC water and spoligotyped; the remaining growth was harvested into two tubes, each containing 1 ml of 7H9 Middlebrook broth medium with 20% glycerol, and stored at -20°C. A total of 241 isolates from 108 badgers collected between 2000 and 2019 inclusive were recovered from the archive. Slopes of modified 7H11 media (Cohn, Waggoner et al. 1968) were seeded with the defrosted samples in October 2020 and incubated at 37 ºC for a six-week period for re-growth under containment level 3 conditions. Successfully regrown isolates were heat killed in hot blocks at 80°C for 30 minutes, and standard Illumina protocols (NexteraXT kits) for generating libraries from heat-killed cell suspensions were applied. These were then run on the Illumina NextSeq instrument with 2 x 150bp paired-end reads. Further details on bioinformatics protocols and SNP calling are included as supplementary material.
Pairwise genetic distances (defined as the raw number of sites that differed in their genetic profile at 1569 quality-filtered SNPs) between M. bovis isolates were calculated in R (v 4.0.2) using the package ‘ape’ (v. 5.6-2) (Paradis and Schliep 2019).
Data analysis
Analyses were carried out using R software version 4.5.0 (R Core Development Team 2025). The impact of social group membership and host genetic population structure on the genetic distance between M. bovis isolates was analysed by constructing mixed models using the R package ‘lme4’ (Bates 2010). This was coupled with jack-knifing procedures to remove biases and pseudoreplication associated with the multiple representation of pathogen isolates and host individuals in the measurement of pairwise distances (Clarke, Rothery et al. 2002). In reviews of statistical methods to deal with the inherent pseudoreplication found in dissimilarity matrices, jack-knifing by “population” (here, host and pathogen isolate identity) has been found to be superior to other methods, especially following critique of the classic Mantel-testing approach (Balkenhol, Waits et al. 2009, Legendre, Fortin et al. 2015, Zeller, Creech et al. 2016, Shirk, Landguth et al. 2018). Genetic distances amongst badger genotypes were based on differences in alleles found across a set of microsatellite markers, and estimated using the Queller and Goodnight rxy estimator of relatedness (Queller and Goodnight 1989) in the R package ‘related’ (Pew, Muir et al. 2015).
Each badger’s main social group was defined as the group that it had been resident in for the longest period of its capture history (according to trapping records) as this was found previously to be the best spatial predictor of genetic distance between isolates by Crispell et al. using a smaller dataset from the same population (Crispell, Benton et al. 2019). In the majority of cases, this was also the individual’s natal group (of the 97 badgers first caught as cubs, 92 (95%) were recaptured in the same social group as they were assigned to at their first capture, i.e., their natal group). The main social group is, therefore, in the vast majority of cases, but not exclusively, analogous to the natal social group.
Genetic distances for pairwise combinations of M. bovis isolates were categorised according to whether the badgers from which they were collected shared the same or different social groups in the year when the isolate was collected. Pairwise distances were also categorised according to the age classes of the badgers as follows: comparisons between isolates from two adult badgers (‘Adult-Adult’), between an adult and a cub or yearling (‘Adult-Young’), and between pairs of cubs/yearlings (‘Young-Young’). Data were restricted to only include pairwise comparisons where isolates had been collected within 3 years of one another, in order to only include individuals likely to have been in the population at the same time. This is in line with recent estimates of the median survival time of badgers in this population as 3.6 years (Konzen, Delahay et al. 2024). We also categorised pairwise isolate distances according to the sex of the host badgers. To account for temporal divergence in pathogen genotypes (consistent with the molecular clock hypothesis (Bromham and Penny 2003)), the number of years between isolate collection dates was also included in the model as a fixed effect.
To test for the effect of social group membership and kin structure on M. bovis isolate distances, we constructed a mixed effects model using the R package ‘lme4’ (Bates 2010) with genetic distance between pathogen isolates as the response variable. Genetic dissimilarity between hosts (modelled as a continuous variable using the rxy relatedness estimator (Queller and Goodnight 1989)), age-class comparison, sex based comparison (M:M, F:F, M:F), time between isolate collection dates and social group membership (whether the badgers were in the same social group at the time of sampling) were all included as fixed effects. A term was also included for the interaction between host genetic similarity and social group membership. As individual badgers contributed multiple dyadic data points to the response variable, the identities of the badgers in each pairwise comparison were included as random effects in the model. Counts of SNP differences in the response variable were treated as Poisson-distributed.
A jack-knifing procedure was used to remove bias from the mixed model’s estimates of fixed effects caused by pseudoreplication due to multiple representation of individual hosts and pathogen isolates in pairwise comparisons (Clarke, Rothery et al. 2002). Jackknifing removed each pathogen isolate in turn from the model and refitted to calculate pseudo-values and then jack-knifed estimates of the variance components and fixed effects of the model. These estimates provided jack-knifed 95% confidence intervals for the parameters of interest. Two isolates collected from badgers in the study area exhibited unusually large genetic distances as compared to the remainder of the isolates (Supplementary Figure 2). These isolates are known to be derived from cattle herds (as evidenced by their closest ancestral isolate on the wider phylogenetic tree being from a cow in the study area rather than a badger; also by their very high genetic distance from other isolates as determined by SNP differences): these were removed from the dataset prior to analysis.