Skip to main content
Dryad

Data from: COPD screening using time–frequency features of self-recorded respiratory sounds

Data files

Aug 11, 2025 version files 1.03 MB

Abstract

Chronic obstructive pulmonary disease (COPD) is the third leading cause of death worldwide, with up to 70% of cases remaining undiagnosed. This paper proposes a COPD screening tool based on time-frequency representation features of self-recorded respiratory sounds. Respiratory sound samples (breath and cough sounds) were extracted from COPD and asymptomatic non-COPD volunteers using a large, scientific-purpose database. We analysed 39 time-frequency representation features of breath and cough sounds, combined with age, sex, and smoking status, using Autoencoder neural networks and random forest algorithms. We compared the performance of different breath and cough random forest models built to detect COPD: based exclusively on sound features, based exclusively on sociodemographic characteristics, and based on sound features and sociodemographic characteristics. Models including breathing features outperformed models exclusively based on sociodemographic characteristics. Specifically, the model combining sociodemographic characteristics and breathing features achieved an AUC, accuracy, sensitivity, and specificity of 0.901, 0.836, 0.871, and 0.761, respectively, in the test set, representing a substantial increase in AUC when compared to the model based exclusively on sociodemographic characteristics (0.901 vs. 0.818). Our results suggest that a lightweight collection of the time-frequency representation features of self-recorded breathing sounds could effectively improve the predictive performance of COPD screening or case-finding questionnaires. COPD screening through self-recorded breathing sounds could be easily integrated as a low-cost first step in case-finding programs, potentially contributing to mitigate COPD underdiagnosis.