#298: Automated segmentation of the vocal folds in laryngeal endoscopy videos using deep convolutional regression networks

A. Hamad, M. Haney, T. E. Lever, and F. Bunyak

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pgs. 140--148, 2019

PlainText, Bibtex, URL, Google Scholar


Swallowing and breathing are vital, life-sustaining upper airway functions that require precise, reciprocal coordination of the vocal folds (VFs). During swallowing, the VFs must fully close to prevent aspiration of food/liquid into the lungs, whereas during breathing, the VFs must remain open to prevent obstruction of airflow into and out of the lungs. This coordination may become impaired by a variety of neurological conditions and diseases. Clinical evaluation relies on transnasal endoscopy to visualize the VFs within the larynx, and subjective interpretation of VF function by clinicians. However, objective, quantitative, and high-throughput analysis of VF function is important for early diagnosis, monitoring disease progression, treatment monitoring, and treatment discovery. In this paper we propose a fully automated, deep learning based VF segmentation system for the analysis of VF motion behavior captured using flexible endoscopes with low-speed capability. Experimental results on human laryngeal videos showed promising results that were robust to many challenges caused by imaging, anatomical, and behavioral variations. The proposed segmentation and tracking system will be used to compute quantitative outcome measures describing VF motion behavior in order to help clinical practice and scientific discovery.