The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models.
IEEE T NEURAL NETWOR
A nonlinear latent variable model for the topographic organization and subsequent visualization of multivariate binary data is presented. The generative topographic mapping (GTM) is a nonlinear factor analysis model for continuous data which assumes an isotropic Gaussian noise model and performs uniform sampling from a two-dimensional (2-D) latent space. Despite the success of the GTM when applied to continuous data the development of a similar model for discrete binary data has been hindered due, in part, to the nonlinear link function inherent in the binomial distribution which yields a log-likelihood that is nonlinear in the model parameters. This paper presents an effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likelihood which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domains, handwritten digit recognition and the topographic organization of semantically similar text-based documents.
|Title:||The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models|
|Keywords:||data clustering, data mining, data visualization, generative modeling, probabilistic modeling, self-organization, text document processing, unsupervised learning, MAPS|
|UCL classification:||UCL > School of BEAMS > Faculty of Maths and Physical Sciences
UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science
Archive Staff Only