TY - GEN
T1 - Lung Sound Classification Using Snapshot Ensemble of Convolutional Neural Networks
AU - Nguyen, Truc
AU - Pernkopf, Franz
PY - 2020/7
Y1 - 2020/7
N2 - We propose a robust and efficient lung sound classification system using a snapshot ensemble of convolutional neural networks (CNNs). A robust CNN architecture is used to extract high-level features from log mel spectrograms. The CNN architecture is trained on a cosine cycle learning rate schedule. Capturing the best model of each training cycle allows to obtain multiple models settled on various local optima from cycle to cycle at the cost of training a single mode. Therefore, the snapshot ensemble boosts performance of the proposed system while keeping the drawback of expensive training of ensembles moderate. To deal with the class-imbalance of the dataset, temporal stretching and vocal tract length perturbation (VTLP) for data augmentation and the focal loss objective are used. Empirically, our system outperforms state-of-the-art systems for the prediction task of four classes (normal, crackles, wheezes, and both crackles and wheezes) and two classes (normal and abnormal (i.e. crackles, wheezes, and both crackles and wheezes)) and achieves 78.4% and 83.7% ICBHI specific micro-averaged accuracy, respectively. The average accuracy is repeated on ten random splittings of 80% training and 20% testing data using the ICBHI 2017 dataset of respiratory cycles.
AB - We propose a robust and efficient lung sound classification system using a snapshot ensemble of convolutional neural networks (CNNs). A robust CNN architecture is used to extract high-level features from log mel spectrograms. The CNN architecture is trained on a cosine cycle learning rate schedule. Capturing the best model of each training cycle allows to obtain multiple models settled on various local optima from cycle to cycle at the cost of training a single mode. Therefore, the snapshot ensemble boosts performance of the proposed system while keeping the drawback of expensive training of ensembles moderate. To deal with the class-imbalance of the dataset, temporal stretching and vocal tract length perturbation (VTLP) for data augmentation and the focal loss objective are used. Empirically, our system outperforms state-of-the-art systems for the prediction task of four classes (normal, crackles, wheezes, and both crackles and wheezes) and two classes (normal and abnormal (i.e. crackles, wheezes, and both crackles and wheezes)) and achieves 78.4% and 83.7% ICBHI specific micro-averaged accuracy, respectively. The average accuracy is repeated on ten random splittings of 80% training and 20% testing data using the ICBHI 2017 dataset of respiratory cycles.
UR - http://www.scopus.com/inward/record.url?scp=85091015126&partnerID=8YFLogxK
U2 - 10.1109/EMBC44109.2020.9176076
DO - 10.1109/EMBC44109.2020.9176076
M3 - Conference paper
AN - SCOPUS:85091015126
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 760
EP - 763
BT - 42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society
PB - Institute of Electrical and Electronics Engineers
T2 - 42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society
Y2 - 20 July 2020 through 24 July 2020
ER -