Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis
Data files
Mar 18, 2024 version files 1.48 MB
-
All_Serotype_Freq_and_low_high_mu.xlsx
-
Fisher_test_analysis_with_the_correction.xlsx
-
JaccardSimilarity.csv
-
METADATA_SGH_SPN.xlsx
-
README.md
-
Serotype_Occurrences_2muGroups.xlsx
-
SerotypeCo-occurrence_FisherTest.xlsx
-
SerotypesSGH_SPN.xlsx
-
Unique_Serotypes_for_each_country.xlsx
Abstract
Pneumococcus serotype co-colonization, caused by the polymorphic bacteria Streptococcus pneumoniae, has been increasingly investigated and reported in recent years. Yet, there is limited information on how co-colonization patterns vary globally, critical for understanding the evolution and transmission dynamics of these bacteria. Here we report on a rich dataset of cross-sectional pneumococcal colonization studies collected from the literature, where we quantified patterns of transmission intensity and co-colonization variation in children populations across different epidemiological settings. Fitting these data to an SIS model with co-colonization under the assumption of quasi-neutrality among multiple interacting strains, our analysis reveals strong patterns of negative co-variation between transmission intensity R0 and susceptibility to co-colonization k, in support of the stress-gradient-hypothesis (SGH) in ecology. According to this hypothesis, ecological interactions between organisms shift positively as environmental stress increases. In our model higher environmental stress is represented via lower values of the basic reproduction number R0, and a shift towards positive interactions is represented via higher vulnerability to co-colonization (higher k) between pneumococcus serotypes.
README: Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis
https://doi.org/10.5061/dryad.hqbzkh1p0
This dataset contains gathered information on pneumococcus colonization and co-colonization in children for a meta-analysis of 19 studies (17 geographic locations around the world) collected between 2000-2017. We extracted prevalences of susceptibles, singly-colonized (hosts carrying one serotype), and co-colonized children (hosts carrying more than 1 serotype) as reported in these studies. We also extracted the sets of serotypes present in any given setting and performed similarity comparisons for serotype composition across sites.
Summarized general information can be found in the file: Metadata_SGH_SPN.xlsx. The serotypes reported in each study, the statistics of their occurrence/co-occurrence, and rankings in the global dataset are provided in the file: SerotypesSGH_SPN.xlsx. We provide also post-processing data files after some analysis, for example, for serotype composition comparisons across studies JaccardSimilarity.csv, and serotype occurrences in 2 study subsets (those of high ratio of single-to-co-colonization, and those of lower ratio of single-to-co-colonization) we provide Serotype Occurrences 2muGroups.xlsx. Finally, we also provide some codes in R developed for our original data analysis and our test of the stress-gradient hypothesis.
Empty cells = data unavailable
Description of the data and file structure
Primary data files
MetadataSGH_SPN.xlsx
This file has 3 worksheets. In the worksheet "SGH-FINAL DATA" we report: Sample size- the number of children considered in that study; Vaccine- before/after vaccination; Vaccination status- The percenatges of children vaccinated for studies after vaccine area (when reported); Time period- time when the samples were collected; URL- the website link of each study; age(< years)- the group age for each country; Method- Method used for SPN carriage detection and serotyping; Nr. of Serotypes- Number of serotypes reported in each study; NT-Non Typable serotype; In the worksheet "Table 1. S I D and T" are reported the values of Susceptible (S), Colonization (I) and Co-colonization (D) prevalences for each country. In the worksheet "mu, R0 and k", we report the empirical values of single-to co-colonization ratio I/D (mu), and the estimated epidemiological parameters: R0 and k.
SerotypesSGH_SPN.xlsx
This file has 3 worksheets. In the worksheet "Serotype reported in each study" we list for each study the identities of serotypes found in carriage. In the worksheet "Matrix occ and co-occ", we report the matrix of pairwise co-occurrences between all serotypes in the entire dataset. In the worksheet "Ranked serotypes" we provide a ranking in descending order of all serotypes across all datasets on the basis of the number of times they were reported as present.
Post-processing data files
JaccardSimilarity.csv : a matrix of Jaccard similarity (number of shared serotypes/total number of serotypes) between any two studies
Serotype Occurrences 2muGroups.xlsx : summarizes pairwise co-occurrence patterns between serotypes in each subgroup of studies
SerotypeCo-occurrence FisherTest.xlsx : details for each serotype pair the frequencies of single occurrences and co-occurrence and results of the test for independence
Fisher_test_analysis_with_the_correction.xlsx : details from applying the Benjamin-Hochberg correction to the independent co-occurrence test in group A and B (highlighted in red, the significant pairs of serotypes)
Unique Serotypes for each country.xlsx : details the serotypes reported exclusively in just one study
R Code
- Main test for the stress-gradient hypothesis (trade-off between R0 and k) : SGH clustering and regression.R
- Jaccard index calculation for serotype composition similarity: Jaccard Index calculation.R
- Chord diagram visualization of serotype co-occurrence patterns: Chordiagram.R
- Ratio (single to co-colonization) I/D vs. studies by continents: Continents geoDist mu.R
- Regression analysis of Jaccard indices vs. geographic distance: geo dist vs jaccard code.R
Sharing/Access information
Please refer to our paper and references contained therein (DOI link is located in the Related Works section). For more details, see tables and figures in the Supplementary Material files linked to our published article.
Methods
These data have been synthesized from studies that report Streptococcus pneumoniae colonization and co-colonization in children populations worldwide. We provide primary data files (metadata and extracted epidemiological variables as well as serotype compositions), processed data files, and some auxiliary R codes for analysis. The main purpose of our initial analyses was to investigate the stress-gradient-hypothesis in pneumococcus, and to link the mathematical modeling framework in previous papers (Gjini and Madec, 2021; Madec and Gjini 2021) to a concrete epidemiological context.
- Gjini, Erida, and Sten Madec. "The ratio of single to co‐colonization is key to complexity in interacting systems with multiple strains." Ecology and Evolution 11.13 (2021): 8456-8474. https://doi.org/10.1002/ece3.7259
- Madec, Sten, and Erida Gjini. "Predicting N-strain coexistence from co-colonization interactions: epidemiology meets ecology and the replicator equation." Bulletin of Mathematical Biology 82.11 (2020): 142. https://doi.org/10.1007/s11538-020-00816-w