Commensal microbiome dysbiosis in keloid disease
Data files
Jul 16, 2024 version files 19.94 GB
-
README.md
7.92 KB
-
RNA-Seq-Con1_1.fq.gz
1.45 GB
-
RNA-Seq-Con1_2.fq.gz
1.55 GB
-
RNA-Seq-Con2_1.fq.gz
1.64 GB
-
RNA-Seq-Con2_2.fq.gz
1.74 GB
-
RNA-Seq-expression_table.xls
65.89 MB
-
RNA-Seq-IL81_1.fq.gz
1.61 GB
-
RNA-Seq-IL81_2.fq.gz
1.73 GB
-
RNA-Seq-IL82_1.fq.gz
1.55 GB
-
RNA-Seq-IL82_2.fq.gz
1.67 GB
-
RNA-Seq-TGFbeta1_1.fq.gz
1.59 GB
-
RNA-Seq-TGFbeta1_2.fq.gz
1.69 GB
-
RNA-Seq-TGFbeta2_1.fq.gz
1.47 GB
-
RNA-Seq-TGFbeta2_2.fq.gz
1.54 GB
-
sw_K01_1.fastq.gz
9.24 MB
-
sw_K01_2.fastq.gz
9.31 MB
-
sw_K02_1.fastq.gz
8.22 MB
-
sw_K02_2.fastq.gz
7.97 MB
-
sw_K03_1.fastq.gz
7.52 MB
-
sw_K03_2.fastq.gz
7.68 MB
-
sw_K04_1.fastq.gz
7.37 MB
-
sw_K04_2.fastq.gz
7.29 MB
-
sw_K06_1.fastq.gz
7.98 MB
-
sw_K06_2.fastq.gz
7.90 MB
-
sw_K07_1.fastq.gz
8.53 MB
-
sw_K07_2.fastq.gz
8.59 MB
-
sw_K08_1.fastq.gz
6.41 MB
-
sw_K08_2.fastq.gz
6.46 MB
-
sw_K09_1.fastq.gz
9.32 MB
-
sw_K09_2.fastq.gz
9.53 MB
-
sw_K11_1.fastq.gz
9.26 MB
-
sw_K11_2.fastq.gz
9.35 MB
-
sw_K12_1.fastq.gz
7.92 MB
-
sw_K12_2.fastq.gz
8.02 MB
-
sw_K13_1.fastq.gz
5.95 MB
-
sw_K13_2.fastq.gz
6 MB
-
sw_K15_1.fastq.gz
5.69 MB
-
sw_K15_2.fastq.gz
5.67 MB
-
sw_N01_1.fastq.gz
6.05 MB
-
sw_N01_2.fastq.gz
6.14 MB
-
sw_N02_1.fastq.gz
8.71 MB
-
sw_N02_2.fastq.gz
8.77 MB
-
sw_N03_1.fastq.gz
7.22 MB
-
sw_N03_2.fastq.gz
7.38 MB
-
sw_N04_1.fastq.gz
9.30 MB
-
sw_N04_2.fastq.gz
9.35 MB
-
sw_N06_1.fastq.gz
8.37 MB
-
sw_N06_2.fastq.gz
8.37 MB
-
sw_N07_1.fastq.gz
8.52 MB
-
sw_N07_2.fastq.gz
8.60 MB
-
sw_N08_1.fastq.gz
9.62 MB
-
sw_N08_2.fastq.gz
9.99 MB
-
sw_N09_1.fastq.gz
8.96 MB
-
sw_N09_2.fastq.gz
9.06 MB
-
sw_N11_1.fastq.gz
8.73 MB
-
sw_N11_2.fastq.gz
8.99 MB
-
sw_N12_1.fastq.gz
7.88 MB
-
sw_N12_2.fastq.gz
8.09 MB
-
sw_N13_1.fastq.gz
8.77 MB
-
sw_N13_2.fastq.gz
8.95 MB
-
sw_N15_1.fastq.gz
9.31 MB
-
sw_N15_2.fastq.gz
9.28 MB
-
ts_K1_1.fastq.gz
19.41 MB
-
ts_K1_2.fastq.gz
19.76 MB
-
ts_K2_1.fastq.gz
15.21 MB
-
ts_K2_2.fastq.gz
15.52 MB
-
ts_K3_1.fastq.gz
17.71 MB
-
ts_K3_2.fastq.gz
18.64 MB
-
ts_K4_1.fastq.gz
23.48 MB
-
ts_K4_2.fastq.gz
24.25 MB
-
ts_K5_1.fastq.gz
19.32 MB
-
ts_K5_2.fastq.gz
18.14 MB
-
ts_N1_1.fastq.gz
3.64 MB
-
ts_N1_2.fastq.gz
4.19 MB
-
ts_N2_1.fastq.gz
5.46 MB
-
ts_N2_2.fastq.gz
6.17 MB
-
ts_N3_1.fastq.gz
3.95 MB
-
ts_N3_2.fastq.gz
4.49 MB
-
ts_N4_1.fastq.gz
6.53 MB
-
ts_N4_2.fastq.gz
7.33 MB
-
ts_N5_1.fastq.gz
3.99 MB
-
ts_N5_2.fastq.gz
4.49 MB
Abstract
Wound healing is an intensely studied topic involved in many relevant pathophysiological processes, including fibrosis. Despite the large interest in fibrosis, the network that related to commensal microbiota and skin fibrosis remain mysterious. Here, we pay attention to keloid, a classical yet intractable skin fibrotic disease to establish the association between commensal microbiota to scaring tissue. Our histological data reveal the presence of microbiota in the keloids. 16S rRNA sequencing characterize microbial composition and divergence between the pathological and normal skin tissue. Our research provides insights into the pathology of human fibrotic diseases, advocating commensal bacteria and IL-8 signaling as useful targets in future interventions of recurrent keloid disease.
https://doi.org/10.5061/dryad.d51c5b0bt
These datasets contain original fastq.gz readouts from 16S rDNA sequencing experiments and scripts used to analyze these data. The experiments were performed on patients with keloids, a fibrotic skin disease. We did two similar experiments with the same way of analysis:
- Swab test: we used sterile cotton swabs to acquire bacteria from the surface of undamaged skin covering keloid and adjacent normal skin. Therefore, we have 12 keloid and 12 normal swabs paired together.
- Tissue test: the keloid tissue with a margin of normal tissue was acquired from samples after surgery (5 patients).
Additionally, we provide the files from bulk RNA sequencing of human dermal fibroblasts treated with IL-8 and TGF-beta here.
Description of the data and file structure
16S rDNA data
File naming convention
All the data, additional files, and R scripts are in the same folder. The .fastq.gz files follow this naming pattern: xy_Tnn_z.fastq.gz
with a meaning explained below:
xy
is a type of sample:sw
For swab tests andts
for tissue testsT
is a type of tissue:N
for normal adjacent tissue andK
for pathological keloid tissuenn
is the identifier for patients, from01
to15
z
have values of1
for forward readout of sequencing,2
for reverse readout.
Some examples:
sw_K13_2.fastq.gz
is a file from swab test, patient 13, acquired in keloid area, reverse primer sequencingts_N02_1.fastq.gz
is a file from a tissue test, patient 02, acquired from normal tissue, forward primer sequencing
List of raw sequencing files
According to the above rules, here is a list of samples:
Swab samples
File name | Patient ID | Primer | Tissue type |
---|---|---|---|
sw_K01_1.fastq.gz | 01 | forward | Keloid |
sw_K01_2.fastq.gz | 01 | reverse | Keloid |
sw_N01_1.fastq.gz | 01 | forward | Normal |
sw_N02_2.fastq.gz | 01 | reverse | Normal |
sw_K02_1.fastq.gz | 02 | forward | Keloid |
sw_K02_2.fastq.gz | 02 | reverse | Keloid |
sw_N02_1.fastq.gz | 02 | forward | Normal |
sw_N02_2.fastq.gz | 02 | reverse | Normal |
The files follow this pattern. The complete list contains the data from patients 01, 02, 03, 04, 06, 07, 08, 09, 11, 12, 13, 15. Total N = 12 patients, total normal files forward = 12, total normal files reverse = 12, total keloid files forward = 12, and total normal files reverse = 12. In summary, there are 48 files for the swab test.
Please note some missing numbers. Patients 05, 10, and 14 were excluded due to poor data quality/insufficient reads from sequencing. We still keep the identical patient IDs because other associated data for patients 05, 10, and 14, like histology or flow cytometry, were technically correct.
Tissue samples
File name | Patient ID | Primer | Tissue type |
---|---|---|---|
ts_K01_1.fastq.gz | 01 | forward | Keloid |
ts_K01_2.fastq.gz | 01 | reverse | Keloid |
ts_N01_1.fastq.gz | 01 | forward | Normal |
ts_N02_2.fastq.gz | 01 | reverse | Normal |
…and so on, like with swab samples. There are no missing patients here; we have patient IDs 01, 02, 03, 04, and 05, and the total number of files (N and K, forward and reverse) is 20 files.
Relations between files
The data from the swab test and tissue test are separate experiments, even if they are processed the same way later in R script; therefore, swab test files from normal tissue, patient 03 are paired with keloid tissue, patient 03. They were acquired from the same patient. Similarly, files ts_K02_1.fastq.gz, and ts_N02_1.fastq.gz are from the same patient 02, forward primer. The first file is from keloid tissue, and the second is from the same patient’s normal adjacent tissue.
However, the patients were different individuals in each study. Thus, patient 01 from the swab test is NOT patient 01 from the tissue test. Group sw
and ts
have no relation or overlap.
Other files
The keloid.R
file uses sample_info_keloid_swab.txt
and sample_info_keloid_tissue.txt
for further processing and analysis.
Bulk RNA-Seq data
The fibroblasts were treated with cytokines, interleukin 8, and transforming growth factor-beta for RNA sequencing. The list of files and their meaning is as follows:
File | Treatment | Replicate | Primer |
---|---|---|---|
RNA-Seq-Con1_1.fq.gz | Control | 1 | forward |
RNA-Seq-Con1_2.fq.gz | Control | 1 | reverse |
RNA-Seq-Con2_1.fq.gz | Control | 2 | forward |
RNA-Seq-Con2_2.fq.gz | Control | 2 | reverse |
RNA-Seq-TGFbeta1_1.fq.gz | TGF-beta | 1 | forward |
RNA-Seq-TGFbeta1_2.fq.gz | TGF-beta | 1 | reverse |
RNA-Seq-TGFbeta2_1.fq.gz | TGF-beta | 2 | forward |
RNA-Seq-TGFbeta2_2.fq.gz | TGF-beta | 2 | reverse |
RNA-Seq-IL81_1.fq.gz | TGF-beta | 1 | forward |
RNA-Seq-IL81_2.fq.gz | TGF-beta | 1 | reverse |
RNA-Seq-IL82_1.fq.gz | TGF-beta | 2 | forward |
RNA-Seq-IL82_2.fq.gz | TGF-beta | 2 | reverse |
Additionally, the file RNA-Seq-expression-table.xls
shows the expression values after processing. We used Trimmomatic to check data quality, and filter the data. We removed the adapter sequences, oligonucleotides below 50bp, or with only one end. For this file, we provide the data with consecutive columns as follows:
-
GeneID: Ensembl database gene identifier (it is not transcript ID, just the gene in the DNA sequence)
- chr, start, end, length and strand: chromosome, location of primers, length of the transcript and sense (+) vs non-sense (-) DNA strand to address a given gene expression.
- name: the name of the gene as an GENE SYMBOL. The values shown as
NA
means that a given DNA region was transcribed but does not have specific gene name. Some genes can have a few transcripts from the same gene, but different gene regions. This file did contains the data that were not collapse. For example, Y_RNA transcripts can be find in rows 110, 118, 120, 177, 685, 728, and others. Though this example is special, other genes may have or may not have a few different transcripts. Con1_COUNT
,Con2_COUNT
columns, etc., refer to FKPM (Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced) counts for each gene in the rows. Con1, Con2, IL81… refer to the treatments or controls as shown in the file list, after filtering.- Description: this column shows long names of genes and some additional description. The symbol
-
means that such a description was not available by our methods method of initial data processing. - GO_Term: here we show initial annotation of a given transcripts to Gene Onthology Database.
- Pathway_Name: as above, but for KEGG database.
Sharing/Access information
There are no linked sources, and the data were not published anywhere else at the time of deposition to Dryad.
All data and scripts are in the public domain.
Code/Software
These data are shared for publication in PNAS Nexus. 16S rDNA data were processed with code in the keloid.R
file with supporting keloid keloid_functions.R
file. The whole file is in R language, with some parts requiring Krona software and an additional database (SILVA). All the information is provided in the keloid.R
file as comments.
There are no other related files or scripts.
16S rDNA sequencing data
The data files here are raw 16S rDNA sequencing accompanied by R scripts that were used to analyze these data.
The source of the data was: (1) a swab test from the surface of normal and keloid skin; (2) the tissues of keloid patients from deeper parts of the skin.
Surface microbiota samples were collected from the pathological location or the normal lateral location of patients using a swab (Catch-all Sample Collection Swab, Epicenter) moistened in Yeast Cell Lysis Buffer (from MasterPure Yeast DNA Purification Kit; Epicenter). Samples were snap-frozen on dry ice, and DNA was isolated from specimens using the PureLink Genomic DNA Mini Kit (Invitrogen).
Amplification of the 16S-V3+V4 region was performed according to the manufacturer’s specifications. Sequencing of 16S rRNA amplicons was conducted by Apexbio Co., Shanghai, China using the Illumina Novaseq platform. The data were analyzed with the attached R scripts.
Bulk RNA-Seq
For RNA sequencing, human dermal fibroblasts were treated with interleukin 8 and transforming growth factor beta in vitro. RNA was extracted using RNeasy Mini Kit (Qiagen). RNA purity and concentration measurement, preparation of RNA library, and transcriptome sequencing were conducted by TIANGEN Co. We used the HISAT2 software to compare clean reads with reference genomes. Next, we apply HTSeq-countb (v0.6.0) to analyze the gene expression level of each sample(model union). Finaly, FPKM(Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced)was used to estimate related gene expression, as seen in expression table.