#330: Orthogonal Region Selection Network for Laryngeal Closure Detection in Laryngoscopy Videos

Y. Y. Wang, A. S. Hamad, T. E. Lever, and F. Bunyak

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pgs. 2167-2172, 2020

PlainText, Bibtex, URL, DOI, Google Scholar


Vocal folds (VFs) play a critical role in breathing, swallowing, and speech production. VF dysfunctions caused by various medical conditions can significantly reduce patients' quality of life and lead to life-threatening conditions such as aspiration pneumonia, caused by food and/or liquid "invasion" into the windpipe. Laryngeal endoscopy is routinely used in clinical practice to inspect the larynx and to assess the VF function. Unfortunately, the resulting videos are only visually inspected, leading to loss of valuable information that can be used for early diagnosis and disease or treatment monitoring. In this paper, we propose a deep learning-based image analysis solution for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos. Laryngeal endoscopy image analysis is a challenging task because of anatomical variations and various imaging problems. Analysis of LAR events is further challenging because of data imbalance since these are rare events. In order to tackle this problem, we propose a deep learning system that consists of a two-stream network with a novel orthogonal region selection subnetwork. To our best knowledge, this is the first deep learning network that learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region, which drastically reduces labor-intensive manual annotation needed for mask or track generation. The proposed two-stream network and the orthogonal region selection subnetwork allow integration of local and global information for improved performance. The experimental results show promising performance for the automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos. Clinical relevance- This paper presents an objective, quantitative, and automatic deep learning based system for detection of laryngeal adductor reflex (LAR) events in laryngoscopy videos.