Supplementary data from: Effect of mass azithromycin distribution on antibiotic resistance in the gut and nasopharynx: A cluster-randomized trial
Data files
Mar 11, 2026 version files 23.12 KB
-
README.md
5.34 KB
-
table_sequencing_depth_NP_samples.csv
12.18 KB
-
table_sequencing_depth_rectal_samples.csv
5.60 KB
Abstract
The data contained in this repository was collected as part of a trial that studies the effects of repeated twice-yearly azithromycin mass drug distribution (MDA) in a pediatric population enrolled in a clinical trial in Niger. MDA has been shown to reduce all-cause childhood mortality, but the potential selection of antibiotic resistance (AMR) is a major public health concern. The repository contains supporting material describing the sequencing quality of samples, while the host-cleaned sequencing data itself can be found in the Sequencing Read Archive (SRA) under BioProject ID PRJNA1337442.
Description of the data and file structure
This repository contains supporting material that describes the sequencing depth of nasopharyngeal and rectal samples. The sequencing data (host-cleaned fastq files) can be found in the Sequencing Read Archive (SRA) under BioProject ID PRJNA1337442.
This data was collected as part of a trial that studies the effects of repeated twice-yearly azithromycin mass drug distribution (MDA) in Niger. MDA has shown to reduce all-cause childhood mortality, but the potential selection of antibiotic resistance (AMR) is a major public health concern.
Data availability
Individual-level metadata linking samples to pooling, site, and treatment arm information contains human subjects data with direct and indirect identifiers and cannot be shared in this public repository. Requests for more information are subject to approval by the AVENIR Study Group and must comply with legal and regulatory requirements. Requests can be made to the PIs, Tom.Lietman@ucsf.edu and/or Kieran.Obrien@ucsf.edu, and will be addressed within 120 days. A data transfer agreement may be required.
Files and variables
File: table_sequencing_depth_NP_samples.csv
This table contains the number of read pairs of the original and quality-filtered+host-cleaned sequencing reads for the nasopharyngeal (NP) samples (both DNA-seq and AMR-enriched).
Variables
sample_id: a randomized integer code that identifies each pooled sample.Original read pairs DNA-seq: the number of read pairs in the original sequencing fastq file for the DNA-seq samples.Host-cleaned & quality-filtered read pairs DNA-seq: the number of read pairs in the quality-filtered and host-cleaned sequencing reads of the DNA-seq samples. These reads were uploaded to the SRA (BioProject ID PRJNA1337442).Fraction of reads kept (%) DNA-seq: percentage of DNA seq reads kept after the QC and host-removal steps for the DNA-seq samples.Original read pairs AMR-enriched: the number of read pairs in the original sequencing fastq file for the AMR-enriched samples.Host-cleaned & quality-filtered read pairs AMR-enriched: the number of read pairs in the quality-filtered and host-cleaned sequencing reads of the AMR-enriched samples.Fraction of reads kept (%) AMR-enriched: percentage of DNA seq reads kept after the QC and host-removal steps for the AMR-enriched samples.Combined read pairs: total number of read pairs used for taxonomic classification and AMR detection pipelines. (Combined DNA-seq and AMR-enriched reads.)
File: table_sequencing_depth_rectal_samples.csv
This table contains the number of read pairs of the original and quality-filtered+host-cleaned sequencing reads for the rectal DNA-seq samples.
Variables
sample_id: a randomized integer code that identifies each pooled sample.Original read pairs: the number of read pairs in the original sequencing fastq file.Host-cleaned & quality-filtered read pairs: the number of read pairs in the quality-filtered and host-cleaned sequencing reads. These are the input files for the taxonomic classification and AMR detection pipelines. These reads were uploaded to the SRA (BioProject ID PRJNA1337442).Fraction of reads kept (%): percentage of reads kept after the QC and host-removal steps.
Human subjects data
This dataset was collected as part of the AVENIR clinical trial, which received ethics approval and informed consent was obtained in accordance with the study protocol. Participants (or their guardians) provided consent for the collection and use of biological samples for research purposes, including the sharing of de-identified, aggregate-level data.
The data deposited here contain only sequencing quality metrics (read pair counts and filtering statistics) and do not include any direct or indirect identifiers of individual participants. De-identification was achieved through the following measures:
- Sample pooling: Each sample represents a pool of multiple individuals' nasopharyngeal or rectal specimens. Sequencing reads were host-cleaned before analysis, making identification at the individual level impossible from the sequencing data alone.
- Randomized sample identifiers: Sample IDs are randomized integer codes with no link to participant identities.
- Restricted content: Only aggregate sequencing depth metrics (original read counts, quality-filtered read counts, and filtering percentages) are shared. No sequence-level data, demographic information, clinical metadata, or any other information that could be used to identify individual participants is included in this repository.
Individual-level metadata linking samples to participants, pooling schemes, sites, and treatment arms is not shared publicly and is available only upon request to the study PIs, subject to approval by the AVENIR Study Group and applicable legal and regulatory requirements.
