Genomic data for Tracing SARS-CoV-2 clusters across local scales: Greater Houston, January–October 2021
Data files
Jun 20, 2025 version files 6.17 MB
-
age_updated.xml
1.93 MB
-
GLMcounty_updated.xml
1.95 MB
-
README.md
1.10 KB
-
sex_updated.xml
1.87 MB
-
texasAndContexual.tsv
418.80 KB
Abstract
A quantitative understanding of local transmission dynamics is essential for designing effective prevention strategies. In this study, we developed a novel algorithm to identify introductions and trace locally circulating clusters. We analyzed over 26,000 SARS-CoV-2 genomes and their associated metadata, collected between January and October 2021, to explore introduction and dispersal patterns in Greater Houston, a major metropolitan area known for its demographic diversity. Our analysis identified more than 1,000 independent introduction events, resulting in clusters of varying sizes. The majority of introductions originated from domestic sources, while international introductions occurred earlier and were associated with larger cluster sizes. An analysis of locally circulating clusters revealed age-structured transmission dynamics. Geographic reconstruction of cluster spread identified Harris County as the primary viral source for surrounding counties. Harris County sustained the local epidemic with fewer external introductions and longer persistence times of circulating lineages. Overall, our high-resolution spatiotemporal reconstruction of the epidemic provides essential insights into the local-scale transmission landscape, supporting outbreak-specific, regional response strategies and public health planning.
https://doi.org/10.5061/dryad.0000000ds
Description of the data and file structure
Here, we provide GISAID accession IDs and demographic metadata, which are available in XML files formatted for BEAST analyses.
Files and variables
File: texasAndContexual.tsv
Description: It provides GISAID accession IDs used in this study. Our dataset comprises 26,138 complete SARS-CoV-2 genomes, including 9,186 sampled from Houston and 16,952 contextual sequences from around the world.
Variables
- IDs
File: age.xml
Description: The XML file for BEAST runs contain the associated age group for the sampled patients in Houston.
File: GLMcounty.xml
Description: The XML file for BEAST runs contain the associated sampling location (county) for the sampled patients in Houston.
File: sex.xml
Description: The XML file for BEAST runs contain the associated sex for the sampled patients in Houston.
This dataset includes genomic data and associated demographic metadata used to trace SARS-CoV-2 clusters in Greater Houston from January to October 2021. It provides GISAID accession IDs and demographic metadata, which are available in XML files formatted for BEAST analyses.