Trust Shattered: A Study on Breath Sound Analysis Using Physicians and Artificial Intelligence

Although crackles have long been regarded as a hallmark finding in physical examinations, this study revealed their unreliability not only among human physicians but also in artificial intelligence systems.

No significant difference of AUROC was noted between physicians and the AI for different breath sound. However, lower AUROC was noted in crackles identification for both physicians and the AI compared to wheezing.

Auscultation has long been a valuable tool for diagnosing diseases and assessing their severity in a real-time, non-invasive, and cost-effective manner. However, the reliability of breath sound interpretation is heavily dependent on physicians’ experience, preferences, and auscultatory skills. Additionally, the inherent characteristics of adventitious breath sounds pose significant classification challenges. More importantly, artificial intelligence (AI) encounters similar difficulties.

In collaboration, the Emergency Department of National Taiwan University Hospital Hsinchu Branch and the Department of Electrical Engineering at National Tsing Hua University established an online breath sound database named the "Formosa Archive of Breath Sound." This database comprises 11,532 breath sound recordings, all captured in the emergency department with clinical fidelity. Leveraging this extensive dataset and advanced data augmentation techniques—including Spec Augment, Gamma Patch-Wise Correction Augmentation, and Mixup—the team developed an AI system for breath sound identification with performance comparable to human physicians.

To evaluate performance, both physicians and AI systems were tasked with identifying abnormal breath sounds. Crackles, a challenging sound to recognize due to its discontinuous, transient nature and lack of musical tonal quality (unlike wheezes), proved problematic. Surprisingly, AI systems did not outperform human physicians in addressing these challenges. Lower specificity, inter-rater agreement, and area under the ROC curve were observed for crackles in the AI analyses as well.

These findings, which underscore the shared limitations of human and AI auscultation in distinguishing crackles, were published on October 15, 2024, in the journal npj primary care respiratory medicine.

"This shared weakness renders crackles an unreliable physical finding. Consequently, medical decisions based on crackles should be approached with caution and verified through additional examinations. Moreover, the low signal-to-noise ratio, crackle-like noise artifacts, and irregular loudness contribute to the difficulty AI systems face in identifying crackles. Future AI training for breath sound identification should focus more intensively on improving the recognition of crackles," said Dr. Huang.

 

Professor Edward Pei-Chuan Huang’s email address: [email protected]

Published: 29 Nov 2024

Contact details:

No.1, Section 4, Roosevelt Road, Taipei.

Country: 
Academic discipline: 
Content type: 
Funding information:

This work was supported by Taiwan National Funds through National Science and Technology Council (grant number MOST 111-2320-B-002-054 and NSTC 112-2320-B-002 -044).