#19: Gaussian mixture density modeling, decomposition and applications


X. Zhuang, Y. Huang, K. Palaniappan, and Y. Zhao

IEEE Trans. Image Processing, Volume 5, pgs. 1293--1302, 1996

classification, data mining, machine learning

PlainText, Bibtex, PDF, DOI, Google Scholar

Abstract

Gaussian mixture density modeling and decompo- sition is a classic yet challenging research topic. We present a new approach to the modeling and decomposition of Gauss- ian mixtures by using robust statistical methods. The mixture distribution is viewed as a (severely) contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose a recursive algorithm called the Gaussian mixture density decomposition (GMDD) algorithm for successively iden- tifying each Gaussian component in the mixture. The proposed decomposition scheme has several distinct advantages that are desirable but lacking in most existing techniques. In the GMDD algorithm the number of componentsdoes not need to be specified U priori, the proportion of noisy data in the mixture can be large, the parameter estimation of each component is virtually initial independent, and the variability in the shape and size of the component densities in the mixture is taken into account. Gaussian mixture density modeling and decomposition has been widely applied in a variety of disciplinesthat require signal or waveform characterization for classification and recognition, including remote sensing, target identification,spectroscopy,elec- trocardiography, speech recognition, or scene segmentation. We apply the proposed GMDD algorithm to the identification and extraction of clusters, and the estimation of unknown probability densities. Probability density estimation by identifying a decom- position using the GMDD algorithm, that is, a superposition of normal distributions, is successfully applied to the difficult biomedical problem of automated cell classification. Computer experiments using both real data and simulated data demonstrate the validity and power of the GMDD algorithm for various models and different noise assumptions.