Harmonized datasets to support seroepidemiology of Trachoma for the elimination endgame
Data files
Jun 11, 2025 version files 51.65 MB
Abstract
Trachoma is targeted for global elimination as a public health problem by 2030. Measurement of IgG antibodies in children is being considered for surveillance and programmatic decision-making. There are currently no guidelines for applications of serology, which represents a generalizable problem in seroepidemiology and disease elimination. We collated Chlamydia trachomatis Pgp3 and CT694 IgG measurements from 48 serosurveys, including surveys across Africa, Latin America, and the Pacific Islands to estimate population-level seroconversion rates (SCR) along a gradient of trachoma endemicity.
https://doi.org/10.5061/dryad.5qfttdzhx
Description of the data and file structure
Overview
This repository includes harmonized data to support research on the seroepidemiology of trachoma and Chlamydia trachomatis, which were harmonized and made publicly available under the NIH-funded study: Seroepidemiology of trachoma for the elimination endgame R01-AI158884.
We are currently making two datasets public: one with individual-level records (trachoma_serology_public_data_indiv_v4) and one with cluster-level records (trachoma_serology_public_data_cluster_v4).
The cluster-level data were generated by summarising the individual-level data by evaluation unit and age. However, note that some additional data were only available at the cluster-level and not at the individual-level, as highlighted in https://www.nature.com/articles/s41467-023-38940-5#Sec9.
The .rds files (trachoma_serology_public_data_indiv_v4 and trachoma_serology_public_data_cluster_v4) include encodings, such as factor labels and levels. Each dataset is accompanied by a codebook (.html), with details about its contents and machine-readable metadata. -- Since the variables in the datasets are too many to list here, please consult these codebooks for their descriptions.
The file summary-changes-public-v4.txt includes version history.
The individual-level dataset includes a field named PMID, which lists the PubMed ID(s) for the primary research articles associated with the contributing data. We recommend you refer to the primary research articles for details on study design, sample collection, and laboratory testing protocols that generated the primary data.
Some data were previously made public in other online locations, and for these, the public_data field in the individual-level dataset will include a URL for the original online source.
The individual-level data are version-controlled, which is marked by a version variable. The version variable indicates the updates or changes to the repository, e.g., adding a new variable or new data into the repository. Therefore, the next future update will be "v5".
Files and variables
Descriptions of variables and abbreviations used are in the accompanying HTML files (trachoma_serology_public_data_cluster_codebook_v4 and trachoma_serology_public_data_indiv_codebook_v4) in the dataset. Missing values are indicated by "NA".
Code/software
- R / Rstudio
Access information
Other publicly accessible locations of the data:
- osf.io/ykjc4
Data was derived from the following sources:
- See the "PMID" field in the trachoma_serology_public_data_indiv_v4 datasets for the original publications of studies that generated the data.
Should you have questions about the contents of this repository, please contact Ben Arnold at UCSF, the PI for the study (ben.arnold@ucsf.edu). https://profiles.ucsf.edu/benjamin.arnold
Human subjects data
All primary data were collected after obtaining informed consent from all participants or their guardians under separate, local human subjects research protocols in accordance with the Declaration of Helsinki. The data were de-identified using randomly generated integers to mask original variables.
