Lower airway microbiota in COPD and healthy controls
Data files
Dec 18, 2024 version files 14.97 MB
-
COPDvC_ASV.qza
3.05 MB
-
COPDvC_RepSeq.qza
2.92 MB
-
COPDvC_rootedtree.qza
2.92 MB
-
COPDvC_Taxonomy.qza
2.98 MB
-
COPDvCAncomASV.qza
3.03 MB
-
COPDvCMetadata.txt
21.46 KB
-
COPDvControl_Rcommand.R
31.52 KB
-
COPDvsControl_Q2_code.txt
5.91 KB
-
README.md
5.53 KB
Abstract
The lower airway microbiota in patients with chronic obstructive pulmonary disease (COPD) are likely altered compared with the microbiota in healthy individuals. Information on how the microbiota is affected by smoking, the use of inhaled corticosteroids (ICS), and COPD severity is still scarce. In the MicroCOPD Study, participant characteristics were obtained through standardised questionnaires and clinical measurements at a single centre from 2012 to 2015. Protected bronchoalveolar lavage samples from 97 patients with COPD and 97 controls were paired-end sequenced with the Illumina MiSeq System. Data were analysed in QIIME 2 and R. Alpha-diversity was lower in patients with COPD than controls (Pielou evenness: COPD=0.76, control=0.80, p=0.004; Shannon entropy: COPD=3.98, control=4.34, p=0.01). Beta-diversity differed with smoking only in the COPD cohort (weighted UniFrac: permutational analysis of variance R2=0.04, p=0.03). Nine genera were differentially abundant between COPD and controls. Genera enriched in COPD belonged to the Firmicutes phylum. Pack years were linked to differential abundance of taxa in controls only (ANCOM-BC (Analysis of Compositions of Microbiomes with Bias Correction) log-fold difference/q-values: Haemophilus -0.05/0.048; Lachnoanaerobaculum -0.04/0.03). Oribacterium was absent in smoking patients with COPD compared with non-smoking patients (ANCOM-BC log-fold difference/q-values: -1.46/0.03). We found no associations between the microbiota and COPD severity or ICS. The lower airway microbiota is equal in richness in patients with COPD to controls, but less even. Genera from the Firmicutes phylum thrive particularly in COPD airways. Smoking has different effects on diversity and taxonomic abundance in patients with COPD compared with controls. COPD severity and ICS use were not linked to the lower airway microbiota.
README: Lower airway microbiota in COPD and healthy controls
This note was written by Solveig Tangedal, on Nov 29th, 2023, and edited on Dec 17th, 2024.
We have submitted deidentified participant data (metadata) in .xlsx and .txt format(COPDvCMetadata.xlsx COPDvCMetadata.txt),
Amplicon sequencing variants (ASV) tables in qza format for all ASVs and for ASVs eligible for differential abundance analyses (COPDvC_ASV.qza COPDvCAncomASV.qza)
Representative sequences tables in qza format for all ASVs (COPDvCRepSeq.qza),
Rooted phylogenetic trees in qza format (COPDvC_rootedtree.qza),
Taxonomy tables in qza format (COPDvC_Taxonomy.qza),
QIIME 2 code in txt format (COPDvsControl_Q2_code.txt),
R code in RScript format (COPDvControl_Rcommand.R),
and the manuscript abstract in docx format (ABSTRACT.docx)
The MicroCOPD study is a large study, with amplicon (16S rRNA) sequencing of more than 3000 samples over > 30 RUNs.
The airways samples number > 2500, and are spread over 30 RUNs.
Data cleanup and curation were performed for all 30 RUNs together to best utilize a shared protocol and the best use of our negative
#control samples (explained in detail in the supplement of the published paper "The airway microbiota and exacerbations of COPD" by
Leiten et al.). The fastq files alone way override our DRYAD storage capacity, and since this study is on a sub-set of the study
#participants only, and looking only at one sample type (BAL), we have chosen to deposit the data matching the stripped-down metadata file.
The FeatureData file is named COPDvC_RepSeq.qza
The FeatureTable files are named COPDvC_ASV.qza and COPDvCAncomASV.qza. Both are curated, but only the latter was filtered according to the advice.
#for Ancom-BC analyses. The curation is described in the Supplement of the current manuscript.
Taxonomy was classified using the q2 feature-classifier classify-sklearn command, providing COPDvC_Taxonomy.qza file
Phylogeny was made using the q2 phylogeny align-to-tree-mafft-fasttree command, providing a COPDvC_rooted-tree.qza file
#Metadata are given as xlsx and txt files as both file types are required for different import to R as can be seen in the R code.
#Code: All needed information for obtaining data and editing metadata is written out in the code files. Metadata is explained and
#re-coded in the QIIME 2 and R documents.
Description metadata
- sampleid: Unique sample identifier
- RUN: Sequencing run in which the samples were analysed
- diagnose: Classification of participants into COPD or Control
- diagnose_NonSmk1yr: Classification of participants into smokers having smoked within the last 12 months before inclusion (COPD_S Control_S_) or non-smokers having not smoked the last 12 months before inclusion (COPD_NS Control_NS_)
- Sex: Male or female sex
- Agecat: Categorised age groups: <60 60-69.99 70+ (years)
- NonSmk1yr: Dichotomised smoking variable: Non-smoking = no smoking the last 12 months before inclusion, Smoking: Smoked within the last 12 months before inclusion
- Packyear: 20 cigarettes daily for 1 year equals 1 packyear
- fev1fvcratio: Forced expiratory volume 1 second/Forced expiratory capacity (Spirometry)
- fev1prcpred: Forced expiratory volume 1 second as percent of predicted value (Spirometry)
- fvcprcpred: Forced expiratory capacity as percent of predicted value (Spirometry)
- gold_3cat: Classification of participants into GOLD stages GOLD I/II, GOLD III/IV and Control
- ICS: Dichotomised inhaled corticosteroid (ICS) use variable for COPD only: Yes No
Freqexac: Dichotomised exacerbation variable for COPD only: Yes No
COPDvC_ASV.qza COPDvCAncomASV.qza COPDvCRepSeq.qza
The Illumina amplicon sequences were quality- and chimera-filtered using the microbiota pipeline Quantitative Insights Into Microbial Ecology 2 (QIIME 2, v. 2018.8), the Divisive Amplicon Denoising Algorithm 2 (DADA2), and VSEARCH.
DADA2 assigns sequences to amplicon sequence variants (ASVs) using denoising methods that generate an error model based on the quality of the sequencing run. The model is then used to discriminate probable biological variations and
probable sequencing errors. Sequences interpreted as truly biologically similar are assigned to the same ASV.
Having negative control samples, we could apply the prevalence-based method in the Decontam algorithm in R to identify amplicon sequences that were interpreted as contamination of the samples. Our research group has published both
an evaluation of Decontam and a detailed methodological consideration discussing sample curation.COPDvC_rootedtree.qza
After de-novo alignment with MAFFT, a phylogenetic tree for diversity analyses was built using FastTree.COPDvC_Taxonomy.qza
ASVs created by DADA2 were assigned taxonomy using a self-trained Naive Bayes classifier and the Silva database v.138.1.
It was processed through rescript in QIIME 2. The chosen dereplication mode was “majority”, meaning that if identical sequences
were present with inconsistent annotation, the annotations seen in the majority of cases were chosen for all sequences. Only reads
derived from sequencing of the V3-V4 region of the 16S rRNA gene were considered.
Code/Software
R is required to run COPDvControl_Rcommand.R
Code: All needed information for obtaining data and editing metadata is written out in the code files. Metadata is explained and recoded in the QIIME 2 and R documents.