#346: Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning


Abstract

Various neurological disorders such as Parkinson’s disease (PD), stroke, amyotrophic lateral sclerosis (ALS), etc. cause oromotor dysfunctions resulting in significant speech and swallowing impairments. Assessment and monitoring of speech disorders offer effective and non-invasive opportunities for differential diagnosis and treatment monitoring of neurological disorders. Oral diadochokinesis (oral-DDK) is a widely used test conducted by speech-language pathologists (SLPs) to assess speech impairments. Unfortunately, analysis of the oral-DDK tests relies on perceptual judgments by SLPs and are often subjective and qualitative, thus limiting their clinical value. In this paper, we propose a multi-modal oral-DDK test analysis system involving automated processing of complementary 1D audio and 2D video signals of both speech and swallowing function. The system aims to automatically generate objective and quantitative measures from the oral-DDK tests to aid early diagnosis and treatment monitoring of neurological disorders. The audio signal analysis component of the proposed system involves a novel multi-scale deep learning network. The video signal analysis component involves tracking mouth and jaw motion during speech tests using our visual landmark tracking software. The proposed system has been evaluated on speech files corresponding to 9 different DDK speech syllables. The experimental results demonstrate promising audio syllable detection performance with an average of 1.6% count error across different types of oral-DDK speech tasks. Moreover, our preliminary results demonstrate added value of combined audio and video signal analysis.