Determination of the order of finite mixture model and its applications

Project: Research project

Project Details


Finite mixture models are natural extensions of parametric models under classical distribution functions, and provide a mathematical approach to the statistical modeling of a wide variety of random phenomena. These models have wide applications in different scientific fields such as biology, genetics, engineering, economics, medical science, and social science; and underpin a variety of techniques in statistics, including cluster analysis, discriminant analysis, image analysis and survival analysis. However, when the order (the number of components) of the finite mixture model is unknown, the finite model is not a classical regular statistical model. In practice, if there are too many components the mixture may over-fit the data and yield poor interpretations, whereas with too few components the mixture may not be flexible enough to approximate the true underlying data structure. An important issue in finite mixture modeling is therefore the selection of the number of components.

We will investigate the penalized likelihood methods for finite mixture models; and first consider finite continuous multivariate Gaussian mixture models, with a penalized method to determine the component number of such a finite mixture model. The asymptotic consistency of the proposed method is to be investigated, together with a revised EM-algorithm to realize the proposed method. We then intend to extend our proposed method to a finite non-Gaussian mixture model, such as the finite Poisson mixture model. On the other hand, it is well known that any continuous distribution can be approximated arbitrarily well by a finite mixture of normal densities (Lindsay, 1995; McLachlan and Peel, 2000), so we will investigate how to use a finite Gaussian mixture model to estimate the multivariate density function. (It provides a parametric closed form and a simple way to interpret the multivariate density function.) In particular, we will consider the effect of the number of components of the finite mixture model on the final multivariate density estimation, and find an appropriate way to determine the number of components of the finite mixture model used to estimate the multivariate density. The finite mixture regression model is an important regression model with wide applications. We will propose a uniformly penalized method to determine the regression mixture component and simultaneously select the variables of the model. Most dimension reduction methods (such as Slice Inverse Regression) heavily depend on the linear condition, which requires that the multivariate random observations should asymptotically follow the elliptical distribution. Based on finite gaussian mixture models, we intend to introduce a procedure to reduce the dimension of the observed multivariate data and so avoid the linear condition. A mixture of factor analyzers model is very useful in pattern recognition and other science fields, and can be used to exploit the local properties of data. We will investigate how to determine the number of mixture components and latent factors simultaneously.

Nowadays, a single observation of many data sets has dimensions in the thousands to billions. We will investigate finite mixture models under a high dimensional setting. In particular, we will seek an efficient method to determine the number of components of the finite mixture and mixture regression models under a high dimensional setting, and then apply this to a high dimension reduc- tion method where either the high dimensional observations are do follow the elliptical distribution or the linear condition is not satisfied.
Effective start/end date1/01/1330/06/16


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.