Data and code from: Viral outbreak dynamics and evolution in wildlife at the interface with humans

Giglio, Rachael 1 ; Westmoreland, Aaron 1 ; Wilber, Mark2; Wilson-Henjum, Grete1; Chan, Aung3; Gardner, Billy2; Horpiencharoen, Wantida4; Gagne, Roderick5; Corondi, Avery6; Baker, Alec4; Combs, Matthew1; Chandler, Jeffrey1; Manlove, Kezia3; Pepin, Kim1

Research facility: United States Department of Agriculture

Published Dec 02, 2025 on Dryad. https://doi.org/10.5061/dryad.fbg79cp81

Data files

Dec 02, 2025 version files 7.17 MB

CDC_Primers.csv

11.25 KB
Giglio_et_al_2025_Beast_Analysis_AccessionIDs.csv

4.33 KB
Giglio_et_al_2025_Full_Data_Accession_IDs.csv

698.78 KB
MoveSTIR_BiologyLetters.zip

347.20 KB
README.md

13.17 KB
SIRMultinomial.zip

6.06 MB
TreasureLake_SCV2_CaptureDiagnostics_Glossary.docx

16.25 KB
TreasureLake_SCV2_CaptureDiagnostics.csv

20.19 KB

Abstract

In this study, we used a multi-faceted approach to understand patterns of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission and persistence in a wild white-tailed deer population. Serology data indicated transmission of SARS-CoV-2 and persistence during the seven-month sampling period. Traditional disease modeling based on deer-to-deer transmission indicated relatively low prevalence with an R 0 of 1.2 and recovery period of 5 days; however, individual-based modeling informed by GPS tracked-movement data captured a potential transmission event. Phylogenetic analyses revealed a recurring pattern of divergent groups of deer-derived sequences with human-derived sequences falling close to each deer-derived cluster.Further, human-derived sequences were frequently sampled months prior to the deer-derived sequences, indicating repeated human to deer spillover. Using multiple types of data as well as both fine and broad scale analyses, we have characterized a pattern of localized outbreaks of SARS-CoV-2 within white-tailed deer populations that are likely recurring due to frequent spillover events. Our results suggest that while deer-to-deer transmission occurs over small spatiotemporal scales, SARS-CoV-2 persistence over longer periods and across larger regions is likely driven by repeated spillover from human populations.

Dataset DOI: 10.5061/dryad.fbg79cp81

Description of the data and file structure

We captured 39 white-tailed deer (31 females and 8 males) via chemical immobilization in winter (January-March) and summer (June-August) of 2024 (Supplementary Info). Biological samples, including serum isolated by centrifuging blood drawn by jugular venipuncture, as well as nasal and oral swabs, were gathered at capture for SARS-CoV-2 testing and shipped to the USDA National Wildlife Research Center (Fort Collins, CO) for diagnostic testing. This information was used to classify the disease status of each animal.

We conducted phylogenetic analyses using complete SARS-CoV-2 collected from both humans and white-tailed deer. Of the sampled individuals, 3 were SARS-CoV-2 positive, and we were able to sequence full viral genomes from samples. These 3 viral genomes and all other genomic sequences used in phylogenetic analyses are available from GISAID (gisaid.org), but we provide the accession IDs here.

Files and variables

Data includes supporting serology information, a data glossary explaining what the columns in the serology file mean, and accession IDs for sequences used in phylogenetic analyses (can be downloaded from GISAID at https://gisaid.org/.

File: Giglio_et_al_2025_Full_Data_Accession_IDs.csv

Description: Accession IDs for all data prior to filtering to just human-derived sequences that were the most related to the deer-derived sequences. All sequences can be downloaded from GISAID at gisaid.org

Variables

GISAID_Accession: the accession ID to look up the genetic sequence
Species: the host species of the virus

File: Giglio_et_al_2025_Beast_Analysis_AccessionIDs.csv

Description: Accession IDs for all data (human and deer-derived viral sequences) used in the phylogenetic analyses. All sequences can be downloaded from GISAID at gisaid.org

Variables

GISAID_Accession: the accession ID to look up the genetic sequence
Species: the host species of the virus

File: TreasureLake_SCV2_CaptureDiagnostics.csv

Description: Serology data used to characterize the disease status of each deer.

Variables: See TreasureLake_SCV2_CaptureDiagnostics_Glossary.docx for definition of each column.

File: TreasureLake_SCV2_CaptureDiagnostics_Glossary.docx

Description: Definitions for each column of serology data found in TreasureLake_SCV2_CaptureDiagnostics.csv

File: CDC_Primers.csv

Description: Primer sequences used to amplify and sequence the SARS-CoV-2 genome.

File: SIRMultinomial.zip

Description: Code to complete the SIBR model. The SIBR model is implemented in R (v 4.2.0) using the targets workflow manager. It requires the following packages to run: nimble, dplyr, ggplot2, ggthemes, coda, Rcpp, RcppEigen, testthat, deSolve, digest, tidyr, scoringRules,BH, abind, lubridate, stringr, xtable, data.table, directlabels, ggpubr, bayesplot, cowplot.

The workflow itself is a bit involved, and so for those interested in reproducing results, the following instructions should suffice. To run the entire workflow, open the make.R script in R and run the tar_make() function with the working directory set to the location of the unzipped SIRMultinomial directory. This should reproduce all model outputs and figures in the output directory. The combined trace plots and epidemiological curves should be saved as a PDF to the working directory you call the script in. This will produce ~9 GB of mcmc outputs and posterior summaries along with the various output plots. The workflow takes about 2 hours to run all the way through on an M2 mac. Performance may vary on other systems.

For those interested in the details of model fitting, these are somewhat distributed throughout the workflow. All of the necessary files are contained in the R directory, while Rscripts contains scripts associated with running simulation-based power analyses.

R
- sir: contains solvers for the underlying ODE model implemented in C++ (all .o, .h, .cpp files) and as well as scripts for compiling with Rcpp (distributions_approx.R, distributions.R, sir_tests.R).
- targets: contains code for sample processing, model configuration, model fitting, posterior summaries, posterior processing, writing outputs, and writing plots.
  - data
    - R/targets/data/surveillance_data.R: Specifies data files to be analysed
    - R/targets/data/assay_performance quantitative values used for specificity and sensitivity of diagnostic tests are set here
  - local_config: convenience script for skipping expensive steps of the workflow
  - model
    - template_model.R: nimble code for SIBR model
  - nimble
    - analysis_combinations.R: specifies the combination of model parametrizations to be fit to each data set. In particular, you can opt to estimate sensitivity/specificity or treat them as fixed by adjusting this script.
    - fit_data_manifest.R: aggregates model parameterizations based on analysis_combinations.R.
    - fit_data.R: fits model to specified data files
    - fit_data_diagnostics.R: generates diagnostic summaries for the model-fitting process
  - plots: contains scripts for a broad range of summary plots from the SIBR model. Only manuscript/treasure_lake_plot.R is used in the manuscript, and so we focus on it here.
    - manuscript
      - treasure_lake_plots.R: processes model output and sample data to generate the posterior trace plots, posterior predictive epidemic curves with sample data overlaid, and formats into a multi-panel figure.
  - posterior_predictive: sets of scripts processing posterior samples and then generating summaries
    - posterior_epidemic_samples.R: processes samples from posterior for underlying ODE model
    - posterior_surveillance_samples.R: processes samples from posterior for observed data from ODE model, accounting for specificity and sensitivity of assays
    - posterior_residual_samples.R: processes residuals from posterior samples
    - posterior_fit_samples.R: calculate validation scores for fit data
    - posterior_epidemic_summaries.R: generate posterior summaries for the underlying ODE model
    - posterior_surveillance_summaries.R: generate posterior summaries for observation model
    - posterior_residual_summaries.R: generate posterior summaries for residuals
  - simulation: contains scripts for simulation-based power analyses not currently implemented in this manuscript, but included for those who wish to explore how sampling schema may affect confidence in results
  - tables: generates comparison tables of model parameterizations and specific model parameters.
- util: contains utility functions for custom nimble functions (gamma_params.R and negative_binomial_params.R), compiling lists of model priors, inits, and other information needed by nimble (model_lists.R), and implementing checkpoints for MCMC (runCheckpointMCMC.R).

The original code for the SIBR model was publshed alongside Hewitt et al. 2022 and is stored here: https://agdatacommons.nal.usda.gov/articles/workflow/Data_and_reproducibility_scripts_from_A_method_for_characterizing_disease_emergence_curves_from_paired_pathogen_detection_and_serology_data/25922608?file=46644076. We have modified the code, both to handle our data and also to produce new plots as outputs of the workflow. In particular, the script Treasure_Lake_Plots.R is a novel addition to the workflow that remixes existing elements to produce posterior estimates of epidemic curves with our sample data overlaid.

Data
- File: First_date_TL_both_thests.rds: Contains PCR and serological test data from both serum and Nobuto strip tests. Individuals were counted as positive if they tested positive on either test.
- Data: Directory containing data necessary to reproduce SIBR results.
  
  Variables:
  - HarvestDate: Date of sample collection, measured in number of days since the start of the sampling period.
  - pcr_positive: Whether a sample was PCR positive (TRUE/FALSE)
  - sero_positive: Whether a sample was seropositive (TRUE/FALSE)
  - File: First_date_TL_nobuto.rds: Contains PCR and serological test data from Nobuto strip tests only. Not used in the manuscript, included only for those who may want to explore how test choice affects inference. Same variables as the above.
  - File: First_date_TL_svnt.rds: Contains PCR and serological test data from serum tests only. Not used in the manuscript, included only for those who may want to explore how test choice affects inference. Same variables as the above.
  - File: full_data.rds: Contains all sample dates for all individuals. Used for plotting full data over model predictions.
    
    Variables:
    - Collection_Date: Date of sample collection (YYYY/MM/DD)
    - Animal_ID: Unique identifier for each individual tested.
    - date: Re-typing of Collection_Date as a date variable.
    - o_pcr_positive: PCR test result for oral swab
    - n_pcr_positive: PCR test result for nasal swab
    - wt_svnt_pos: Wildtype SVNT result from serum
    - om_svnt_pos: Omicron SVNT result from serum
    - wt_nobuto: Wildtype SVNT result from Nobuto strip
    - om_nobuto: Omicron SVNT result from Nobuto strip
    - days: Collection date measured as days from the beginning of the sampling period.

File: MoveSTIR_BiologyLetters.zip

Description: Code to complete the MoveSTIR model and simulations. These scripts use the TreasureLake_SCV2_CaptureDiagnostics.csv file described above and movement data deposited on MoveBank (Study ID 3312262359). MoveSTIR relies on the following software packages: Python v3.12, scipy, numpy, matplotlib, pandas, geopandas, numba, rasterio, jupyter, ipython, and seaborn. Where no version number is given, the latest version is compatible with the software. All dependencies can be installed with the Anaconda package management system using the environment.yml included in Users/mqwilber/Repos/scv2_persistence. It also requirs the R packages ctmm, data.table, `parallel.

File: fit_and_predict_ctmm.R: Convert each GPS trajectory to 5 5-minute continuous-time-movement-model (CTMM) trajectory using the ctmm package. Save the predictions from CTMM on a 5-minute scale
File: process_transmission_kernels.py: construct the time-varying transmission matrix and save to disk.
File: simulate_sibr.py: simulate the SIBR model on a dynamic moveSTIR network
File: summarize_movestir_results.ipynb: Jupyter notebook for summarizing the results and generating plots from the manuscript.
Users
- mqwilber
  - Repos
    - scv2_persistence
      - File: environment.yml: YAML file containing necessary Python packages to run MoveSTIR.

Access information

Other publicly accessible locations of the data:

GISAID (gisaid.org)
MoveBank

Data was derived from the following sources:

GISAID (gisaid.org)

Data and code from: Viral outbreak dynamics and evolution in wildlife at the interface with humans

Data files

Abstract

README: Data and code from: Viral outbreak dynamics and evolution in wildlife at the interface with humans

Description of the data and file structure

Files and variables

File: Giglio_et_al_2025_Full_Data_Accession_IDs.csv

Variables

File: Giglio_et_al_2025_Beast_Analysis_AccessionIDs.csv

Variables

File: TreasureLake_SCV2_CaptureDiagnostics.csv

Variables: See TreasureLake_SCV2_CaptureDiagnostics_Glossary.docx for definition of each column.

File: TreasureLake_SCV2_CaptureDiagnostics_Glossary.docx

File: CDC_Primers.csv

File: SIRMultinomial.zip

Variables:

Variables:

File: MoveSTIR_BiologyLetters.zip

Access information