다양한 오디오 스펙트로그램을 이용한 다중 특성 학습 기반 이상 호흡음 탐지 > Publications

Biomarkers
다양한 오디오 스펙트로그램을 이용한 다중 특성 학습 기반 이상 호흡음 탐지

페이지 정보

PUBLICATION	Journal of Internet Computing and Services
AUTHORS	변규린, 양희규, 추현승

ABSTRACT

초록

팬데믹 이후 비대면 진료와 스마트 헬스케어 기술의 수요가 급격히 증가하면서, 호흡기 질환을 조기에 진단하고 장기적으로 모니터링할 수 있는 자동화된 분석 기술의 필요성이 커지고 있다. 본 연구에서는 공개 폐음 데이터셋(HF Lung V1, ICBHI)을 기반으로, 다중 스펙트럼 특징(Mel Spectrogram, MFCC, Chroma) 을 활용한 CNN–TCN 결합 모델을 제안하였다. 입력된 스펙트럼은 사전 학습된 VGG16을 통해 주파수–시간 영역의 지역적 특징을 추출하고, 브랜치별 Temporal Convolutional Network(TCN)은 dilated convolution (dilation=1, 2, 4)을 이용해 시계열 의존성을 학습한다. 실험 결과, 제안된 모델은 기존 LSTM 및 GRU 기반 모델보다 평균 F1-score가 약 4~6 % 이상 향상되었으며, 약 27만 개 이하의 파라미터로 높은 효율성과 일반화 성능을 동시에 확보하였다. 이 결과는 멀티스펙트럼 입력과 시계열 확장 구조의 결합이 이상호흡음 분류의 정확도를 효과적으로 향상시킬 수 있음을 보여준다.

Following the pandemic, the demand for remote healthcare and smart monitoring technologies has significantly increased, emphasizing the importance of automated respiratory sound analysis for early diagnosis and long-term monitoring. This study proposes a multi-spectral CNN–TCN model for abnormal respiratory sound classification using publicly available datasets (HF Lung V1 and ICBHI). Three spectral representations—Mel Spectrogram, MFCC, and Chroma—are independently processed and encoded by a VGG16-based CNN pretrained on the ImageNet dataset, followed by individual Temporal Convolutional Network (TCN) modules with dilated convolutions (dilation = 1, 2, 4) to capture short- and long-term temporal dependencies. Experimental results show that the proposed model achieves a mean F1-score improvement of over 4~6% compared with conventional LSTM and GRU models, while maintaining a lightweight architecture with fewer than 0.3 million parameters. These findings demonstrate that combining multi-spectral feature fusion with temporal convolutional learning significantly enhances the accuracy and robustness of abnormal respiratory sound classification.

목록
- 검색
- 닫기