Skip to main content

Combining epidemiological and ecological methods to quantify social effects on E. coli transmission

Cite this dataset

Farthing, Trevor et al. (2021). Combining epidemiological and ecological methods to quantify social effects on E. coli transmission [Dataset]. Dryad.


Enteric microparasites like Escherichia coli utilize multiple transmission pathways to propagate within and between host populations. Characterizing the relative transmission risk attributable to host social relationships, and direct physical contact between individuals is paramount for understanding how microparasites like E. coli spread within affected communities and estimating colonization rates. To measure these effects, we carried out commensal E. coli transmission experiments in two cattle (Bos taurus) herds, wherein all individuals were equipped with real-time location tracking devices. Following transmission experiments in this model system, we derived temporally dynamic social and contact networks from location data. Estimated social affiliations, and dyadic contact frequencies during transmission experiments informed pairwise accelerated failure time models that we used to quantify effects of these sociobehavioral variables on weekly E. coli colonization risk in these populations. We found that sociobehavioral variables alone were ultimately poor predictors of E. coli colonization in feedlot cattle, but can have significant effects on colonization hazard rates (p ≤ 0.05). We show, however, that observed effects were not consistent between similar populations. This work demonstrates that transmission experiments can be combined with real-time location data collection and processing procedures to create an effective framework for quantifying sociobehavioral effects on microparasite transmission.


Two transmission experiments were carried out over two distinct study periods, between 5/22/2017 – 7/10/2017 and 5/21/2018 – 7/30/2018. During each of these periods, 70 approximately 15-month-old beef cattle were introduced to and kept in a single 30.5 X 38 m2 outdoor pen at a commercial cattle feedlot research center in Manhattan, KS. All individuals were castrated males that were unfamiliar with the enclosure prior to entering the study. None of the 70 individuals in the 2017 study were retained for 2018 experiments. The number of individuals included in each transmission experiment (i.e., n = 70) was intended to mimic stocking rates observed in U.S. concentrated animal feeding operations, ensuring that observed results reflect real-world colonization rates in these agricultural systems. All animal care, handling, and monitoring procedures were approved by the Kansas State University Institutional Animal Care and Use Committee.

To facilitate continuous location tracking over the course of each study period, all calves were outfitted with radio-transmitting ear tags (Smartbow GmbH, Weibern, Austria) that communicated with receivers around the pen. System software triangulated calves’ (x, y) positions during each communication event and logged positional data in a central server. The (x, y) coordinate pairs obtained through this real-time location system were 90% accurate to within ± 0.5 m of individuals’ true locations according to company documentation (Smartbow GmbH, Weibern, Austria). To lessen error-induced noise and standardize the temporal resolution of our data at 10 seconds, we filtered and smoothed the data following the procedure we previously outlined in Dawson et al. (2019). In accordance with this procedure, prior to smoothing, points that fell outside of the pen area or suggested that individuals were moving in excess of 10 m/s speeds were assumed to be erroneous and removed from the data set. We generated a smoothed data set to be used in subsequent network production, where (x, y) coordinates represented individuals’ average location at each 10-second interval over the course of the study period. 

Here we present two data files for use in pairwise accelerated failure time models:

1.) Longitudinal E. coli prevalence data for the 2017 and 2018 study populations ("eColiShedding.csv"),

2.) Weekly weighted inter-animal contact and social network edge sets with appended social-network degree metrics ("AFTCovariates.csv"). 

The data processing methodologies used to generate each file are described below. For more information see the Methods section in our aaccompanying manuscript.

1.) Longitudinal E. coli prevalence data for the 2017 and 2018 study populations

At the beginning of each study period, five steers were randomly selected from the pen of 70 for inoculation. To maximize the probability of successfully establishing shedding in inoculated calves, each individual was orally inoculated daily for five consecutive days with 109 colony forming units (CFU) of a single E. coli strain made resistant to nalidixic acid and rifampicin. The rest of the animals (n = 65) were screened for the inoculated E. coli with the dual resistance prior to the experiment. Only animals that tested negative were used in the study. We chose to inoculate 5 individuals because preliminary data suggested that this number would successfully trigger an E. coli outbreak in the remaining, non-inoculated population. Fecal samples from all steers were collected weekly for the duration of each study period. Samples were spiral plated on MacConkey agar supplemented with nalidixic acid (50 µg/ml) and rifampicin (50 µg/ml) to quantify the concentration of E. coli. Samples negative (i.e., not quantifiable) by spiral plating were enriched in E. coli broth for 6 hours at 37º C and plated on MacConkey agar supplemented with nalidixic acid (50 µg/ml) and rifampicin (50 µg/ml) to detect the inoculated strain and establish positive or negative shedding status. Any positive (i.e., CFU > 0) fecal samples obtained from individuals that were not initially challenged with E. coli were assumed to be the result of successful field transmission of the inoculated strain.

2.) Weekly weighted inter-animal contact and social network edge sets with appended social-network degree metrics

Prior to creating contact networks, we removed point locations observed during the first three days (i.e., when RFID tags initiate transmission at different times, and calves are acclimating to their new environment) and the last day (i.e., when tags are removed at different times) of each study period from the data set to prevent abnormal contact patterns associated with these times from biasing results. The spatial threshold for identifying contact has been shown to be influential in transmission models using point-based locational data, and represents a tradeoff between the inclusion of biologically relevant behavior and non-contact noise that can obscure actual contact patterns (Dawson et al. 2019). Here we chose 0.71 m as our proximity-based contact threshold because it approximates the estimated maximum distance between two calves’ tags during shoulder or chest allogrooming events (0.5 m), while accounting for the positional accuracy of our system (Farthing et al. 2020). Using location-data processing procedures described by Dawson et al. (2019), we accepted a temporal sampling window of 10 seconds, and created 1104 and 1608 hourly-aggregated contact networks for our 2017 and 2018 herds, respectively. Edges in these networks were weighted by the sum of contacts observed between each hourly dyad. Hourly contact networks were used to derive social networks, and later aggregated up to the day level for hazard modelling.

To create social networks, we first generated 100 randomized location data sets by shuffling observed 24-hr-length individual-level movement paths across the entirety of the empirical data set (i.e., in randomized sets, calves visited the same locations as they did in the empirical one, but not necessarily on the same day). We then created hourly-aggregated null contact graphs by applying our contact-network creation procedure to each randomized location set, and averaging observed dyadic contact weights in each hour-length graph across the random replicates. These null graphs were representative of the contact distribution we would expect when all contact events occurred solely due to random chance in any given hour (Spiegel et al. 2016).

Due to differences in the frequency of social behaviors associated with increased dyadic contact rates (e.g., allogrooming, headbutting, etc.) during daytime hours relative to nighttime hours – when animals would primarily be immobile and resting, we decided to examine social relationships between calves during daytime hours only, when animals were most active. Given observed trends in data sparsity and contact behavior each year, we chose to define active hour sets for 2017 and 2018 as timepoints between 06:00:00 to 21:59:59 UTM and 06:00:00 to 19:59:59 UTM respectively, and subset empirical and null contact models accordingly. Active-hour subsets were then aggregated up to the week level to allow for elucidation of weekly social relationships between calves at the same temporal resolution of E. coli sample collection.

For each week-length graph in empirical and null sets, we calculated the maximum number of potential contacts that could possibly exist between dyadic pairs (for a detailed description of mathematical equation we used, see Section 2.2.2 in our associated manuscript). We then subtracted the number of realized dyadic contacts from potential ones to estimate the number of time points when both dyad members were represented in our empirical and randomized point-location sets, but were not in contact with one another. We used a series of binomial exact tests to compare dyadic in-contact and out-of-contact time point counts in weekly empirical graphs to their null model counterparts. We chose to use binomial exact tests rather than Chi-square goodness of fit tests because 2018 empirical graphs and all null models include instances when expected contact values are very small, and therefore approximations of p given by a Chi-square test may not be correct. To control for the type-I error rate associated with running so many tests, we used a Bonferroni-corrected α-level to determine “significant” deviations from null-distribution contact rates. We set α = 2.07e-05. For each empirical week-length “active” graph, we identified when individuals had more contacts than would be expected at random (p ≤ 2.07e-05). When node pairs (i.e., cattle) had more contacts with one another than would be expected at random in a given week, we assigned an edge between them in the weekly social network. Thus, edges in our social networks are indicative of underlying social relationships or behaviors that increased contact frequency between node pairs.


  1. Dawson, D.E., T.S. Farthing, M.W. Sanderson, & C. Lanzas. (2019). Transmission on empirical dynamic contact networks is influenced by data processing decisions. Epidemics 26:32-42.
  2. Farthing, T.S., D.E. Dawson, M.W. Sanderson, & C. Lanzas. (2020). Accounting for space and uncertainty in real-time-location-system-derived contact networks. Ecol Evol 10(11):4702-4715. 
  3. Spiegel, O., S.T. Leu, A. Sih, & C.M. Bull. (2016). Socially interacting or indifferent neighbors? Randomization of movement paths to tease apart social preference and spatial constraints. Methods Ecol Evol 7(8): 971-979.

Usage notes

We provide annotated R code for constructing and analyzing pairwise accelerated failure time models to quantify effects of contact- and social-network connectivity on E. coli colonization risk in feedlot cattle ("Pairwise_Accelerated_Failure_Time_Modeling-Ecoli_colonization_in_cattle.Rmd"). 

The data set we refer to as "eColiShedding.csv" contains the sampling data from the trials described above. There are 1,120 observations within this set. Columns are described immediately below.

year: Year that the study took place (i.e., 2017 or 2018).

calf_id: Unique ids for calves within the study.

REDI_id: Tags used to identify calves during the study (note that REDI tags were often reused for multiple individuals across different years).

inoc_date: Date that certain calves were inoculated. If NA, individuals were not inoculated with E. coli, and any observed E. coli colonization was the result of successful transmission. Dates are given in MDY format.

Week: Unique integer week ID within the given year that the sample was collected.

sampling_date: Date of sample collection. Dates are given in MDY format.

shedding_fecal: Binary. If "1", individuals' fecal samples tested positive (i.e., at least 1 Colony Forming Unit) for E. coli in a given week. If "0", fecal samples tested negative.

The data set we refer to as "AFTCovariates.csv" contains dyad-level contact and social network metrics to be used as covariates in pairwise accelerated failure time models. There are 77,592 observations within this set. Columns are described immediately below.

from: Individual from which a directed contact occurs (note that all contact edges are represented are represented twice in this data set, as each individual involved is represented as both a “from” and a “to” node).

to: Individual to which a directed contact occurs (note that all contact edges are represented are represented twice in this data set, as each individual involved is represented as both a “from” and a “to” node).

dyadID: Unique combination of “from” and “to” node IDs.

contacts.avg: The average number of daily contacts, defined as instances when individuals’ RFID tags were observed within 0.71m of one another, between the dyad members during the specified time block.

social: Binary variable describing if dyad members also share an edge in the social network during the specified time block. If "1", individuals were connected by an edge in the social network. If "0", they were not.

socialDeg.from: The number of individuals connected to the “from” node in the social network for a specific time block. The number of individuals connected to the “to” node in the social network for a specific time block.

block: Unique id for the time block during which contact and social networks were created.


National Institute of General Medical Sciences, Award: R01GM117618