Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis

Dekaj, Ermanda1 ; Gjini, Erida 1

Research facility: University of Lisbon

Published Mar 18, 2024 on Dryad. https://doi.org/10.5061/dryad.hqbzkh1p0

Data files

Mar 18, 2024 version files 1.48 MB

All_Serotype_Freq_and_low_high_mu.xlsx

15.21 KB
Fisher_test_analysis_with_the_correction.xlsx

726.22 KB
JaccardSimilarity.csv

4.07 KB
METADATA_SGH_SPN.xlsx

357.70 KB
README.md

4.53 KB
Serotype_Occurrences_2muGroups.xlsx

99.07 KB
SerotypeCo-occurrence_FisherTest.xlsx

205.77 KB
SerotypesSGH_SPN.xlsx

53.21 KB
Unique_Serotypes_for_each_country.xlsx

11.59 KB

Abstract

Pneumococcus serotype co-colonization, caused by the polymorphic bacteria Streptococcus pneumoniae, has been increasingly investigated and reported in recent years. Yet, there is limited information on how co-colonization patterns vary globally, critical for understanding the evolution and transmission dynamics of these bacteria. Here we report on a rich dataset of cross-sectional pneumococcal colonization studies collected from the literature, where we quantified patterns of transmission intensity and co-colonization variation in children populations across different epidemiological settings. Fitting these data to an SIS model with co-colonization under the assumption of quasi-neutrality among multiple interacting strains, our analysis reveals strong patterns of negative co-variation between transmission intensity R₀ and susceptibility to co-colonization k, in support of the stress-gradient-hypothesis (SGH) in ecology. According to this hypothesis, ecological interactions between organisms shift positively as environmental stress increases. In our model higher environmental stress is represented via lower values of the basic reproduction number R₀, and a shift towards positive interactions is represented via higher vulnerability to co-colonization (higher k) between pneumococcus serotypes.

https://doi.org/10.5061/dryad.hqbzkh1p0

This dataset contains gathered information on pneumococcus colonization and co-colonization in children for a meta-analysis of 19 studies (17 geographic locations around the world) collected between 2000-2017. We extracted prevalences of susceptibles, singly-colonized (hosts carrying one serotype), and co-colonized children (hosts carrying more than 1 serotype) as reported in these studies. We also extracted the sets of serotypes present in any given setting and performed similarity comparisons for serotype composition across sites.

Summarized general information can be found in the file: Metadata_SGH_SPN.xlsx. The serotypes reported in each study, the statistics of their occurrence/co-occurrence, and rankings in the global dataset are provided in the file: SerotypesSGH_SPN.xlsx. We provide also post-processing data files after some analysis, for example, for serotype composition comparisons across studies JaccardSimilarity.csv, and serotype occurrences in 2 study subsets (those of high ratio of single-to-co-colonization, and those of lower ratio of single-to-co-colonization) we provide Serotype Occurrences 2muGroups.xlsx. Finally, we also provide some codes in R developed for our original data analysis and our test of the stress-gradient hypothesis.

Empty cells = data unavailable

Description of the data and file structure

Primary data files

MetadataSGH_SPN.xlsx

This file has 3 worksheets. In the worksheet "SGH-FINAL DATA" we report: Sample size- the number of children considered in that study; Vaccine- before/after vaccination; Vaccination status- The percenatges of children vaccinated for studies after vaccine area (when reported); Time period- time when the samples were collected; URL- the website link of each study; age(< years)- the group age for each country; Method- Method used for SPN carriage detection and serotyping; Nr. of Serotypes- Number of serotypes reported in each study; NT-Non Typable serotype; In the worksheet "Table 1. S I D and T" are reported the values of Susceptible (S), Colonization (I) and Co-colonization (D) prevalences for each country. In the worksheet "mu, R0 and k", we report the empirical values of single-to co-colonization ratio I/D (mu), and the estimated epidemiological parameters: R0 and k.
SerotypesSGH_SPN.xlsx

This file has 3 worksheets. In the worksheet "Serotype reported in each study" we list for each study the identities of serotypes found in carriage. In the worksheet "Matrix occ and co-occ", we report the matrix of pairwise co-occurrences between all serotypes in the entire dataset. In the worksheet "Ranked serotypes" we provide a ranking in descending order of all serotypes across all datasets on the basis of the number of times they were reported as present.

Post-processing data files

JaccardSimilarity.csv : a matrix of Jaccard similarity (number of shared serotypes/total number of serotypes) between any two studies
Serotype Occurrences 2muGroups.xlsx : summarizes pairwise co-occurrence patterns between serotypes in each subgroup of studies
SerotypeCo-occurrence FisherTest.xlsx : details for each serotype pair the frequencies of single occurrences and co-occurrence and results of the test for independence
Fisher_test_analysis_with_the_correction.xlsx : details from applying the Benjamin-Hochberg correction to the independent co-occurrence test in group A and B (highlighted in red, the significant pairs of serotypes)
Unique Serotypes for each country.xlsx : details the serotypes reported exclusively in just one study

R Code

Main test for the stress-gradient hypothesis (trade-off between R0 and k) : SGH clustering and regression.R
Jaccard index calculation for serotype composition similarity: Jaccard Index calculation.R
Chord diagram visualization of serotype co-occurrence patterns: Chordiagram.R
Ratio (single to co-colonization) I/D vs. studies by continents: Continents geoDist mu.R
Regression analysis of Jaccard indices vs. geographic distance: geo dist vs jaccard code.R

Sharing/Access information

Please refer to our paper and references contained therein (DOI link is located in the Related Works section). For more details, see tables and figures in the Supplementary Material files linked to our published article.

Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis

Data files

Abstract

README: Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis

Description of the data and file structure

Sharing/Access information

Methods

Works referencing this dataset