Data and code for: Dihydrothiazolo ring-fused 2-pyridone antimicrobial compounds effectively treat Streptococcus pyogenes skin and soft tissue infection
Data files
Apr 26, 2024 version files 123.50 KB
-
Differential_Expression_PCA.csv
122.07 KB
-
README.md
1.43 KB
Abstract
We have developed GmPcides from a peptidomimetic dihydrothiazolo ring-fused 2-pyridone scaffold that have antimicrobial activities against a broad-spectrum of Gram-positive pathogens. Here we examine the treatment efficacy of GmPcides using skin and soft tissue infection (SSTI) and biofilm formation models by Streptococcus pyogenes. Screening our compound library for minimal inhibitory (MIC) and minimal bactericidal (MBC) concentrations identified GmPcide PS757 as highly active against S. pyogenes . Treatment of S. pyogenes biofilm with PS757 revealed robust efficacy against all phases of biofilm formation by preventing initial biofilm development, ceasing biofilm maturation and eradicating mature biofilm. In a murine model of S. pyogenes SSTI, subcutaneous delivery of PS757 resulted in reduced levels of tissue damage, decreased bacterial burdens and accelerated rates of wound-healing, which were associated with down-regulation of key virulence factors, including M protein and the SpeB cysteine protease. These data demonstrate that GmPcides show considerable promise for treating S. pyogenes infections.
README
Title of Dataset: Data and code for: Dihydrothiazolo ring-fused 2-pyridone antimicrobial compounds effectively treat Streptococcus pyogenes skin and soft tissue infection
We have submitted our raw data (Differential_Expression_PCA) and R script (RNASeq_PCA). Access this data on Dryad (10.5061/dryad.pvmcvdntj).
Descriptions
· Sample: two group of samples. (1) GP757: RNA-seq analysis on Streptococcus pyogenes cells under the treatment of GmPcide PS757. This is the experimental group. (2) DMSO: RNA-seq analysis on Streptococcus pyogenes cells under the treatment of DMSO. This the control group.
· Row 1: gene names. (1) Genes with annotated protein functions, e.g., aroE_1. (2) Genes annotated as hypothetical protein, e.g., NA.
· Row 2~7: Values representing the differential expression levels of Streptococcus pyogenes genes under two different treatment conditions, GP757 and DMSO.
Key information sources
Differential expression levels of Streptococcus pyogenes genes under two different treatment conditions were derived from the NCBI data base under the project accession no. PRJNA1040846.
Code/Software
R is required to run RNASeq_PCA; the script was created using version 4.1.1. Annotations are provided throughout the script through 1) library loading, 2) dataset loading and cleaning, 3) analyses, and 4) figure creation.
Methods
RNA Sequencing. Microplate (96-well) culture in C medium was conducted as described above with the addition of 0.4 µM PS757 or vehicle (DMSO). At 24 hrs, multiple wells were harvested and pooled for further processing, with the experiment repeated in triplicate. Extraction of RNA utilized the Direct-zol RNA Miniprep Plus Kit (Zymo Research, R2072) with the quality of the purified RNA determined by spectroscopy (NanoDrop 2000, Thermo Fisher). Libraries for Illumina sequencing were prepared using the FastSelect RNA kit (Qiagen, 334222), according to the manufacture’s protocol and sequences determined using an Illumina NovaSeq 6000. Basecalls and demultiplexing were performed with Illumina’s bcl2fastq software and a custom python demultiplexing program with a maximum of one mismatch in the indexing read. RNA-seq reads were then aligned to the Ensembl release 101 primary assembly with STAR version 2.7.9a (1). Gene counts were derived from the number of uniquely aligned unambiguous reads by Subread:featureCount version 2.0.3 (2). Isoform expression of known Ensembl transcripts were quantified with Salmon version 1.5.2 (3) and assessed for the total number of aligned reads, total number of uniquely aligned reads, and features detected. The ribosomal fraction, known junction saturation, and read distribution over known gene models were quantified with RSeQC version 4.0 (4).
Comparative Transcriptomic Analysis. All gene counts obtained from RNA-seq were then imported into the R/Bioconductor package EdgeR (5) and TMM normalization size factors calculated to adjust for differences in library size. Ribosomal genes and genes not expressed in the smallest group size minus one sample greater than one count-per-million were excluded from further analysis. The TMM size factors and the matrix of counts were then imported into the R/Bioconductor package Limma (6). Weighted likelihoods based on the observed mean-variance relationship of every gene and sample were calculated for all samples and the count matrix transformed to moderated log2-counts-per-million with Limma’s voomWithQualityWeights (7). The performance of all genes was assessed with plots of the residual standard deviation of every gene to their average log-count with a robustly fitted trend line of the residuals. Differential expression analysis was then performed to analyze for differences between conditions with results filtered for only those genes with Benjamini-Hochberg false-discovery rate adjusted p-values less than or equal to 0.05. A principal component analysis (PCA) was performed on differential expression data to distinguish differences between conditions (8). To find the significantly regulated genes, the Limma voomWithQualityWeights transformed log2-counts-per-million expression data was then analyzed via weighted gene correlation network analysis with the R/Bioconductor package WGCNA (9). Briefly, all genes were correlated across each other by Pearson correlations and clustered by expression similarity into unsigned modules using a power threshold empirically determined from the data. An eigengene was then created for each de novo cluster and its expression profile was then correlated across all coefficients of the model matrix. Because these clusters of genes were created by expression profile rather than known functional similarity, the clustered modules were given the names of random colors where grey is the only module that has any pre-existing definition of containing genes that do not cluster well with others. The information for all clustered genes for each module were then combined with their respective statistical significance results from Limma to determine whether or not those features were also found to be significantly differentially expressed.
References
1. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013).
2. Y. Liao, G. K. Smyth, W. Shi, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014).
3. R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14, 417-419 (2017).
4. L. Wang, S. Wang, W. Li, RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184-2185 (2012).
5. M. D. Robinson, D. J. McCarthy, G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 (2010).
6. M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, G. K. Smyth, limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015).
7. R. Liu, A. Z. Holik, S. Su, N. Jansz, K. Chen, H. S. Leong, M. E. Blewitt, M. L. Asselin-Labat, G. K. Smyth, M. E. Ritchie, Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43, e97 (2015).
8. Z. Zou, R. F. Potter, W. H. t. McCoy, J. A. Wildenthal, G. L. Katumba, P. J. Mucha, G. Dantas, J. P. Henderson, E. coli catheter-associated urinary tract infections are associated with distinctive virulence and biofilm gene determinants. JCI Insight 8, (2023).
9. P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).