Department of Mathematical Sciences
|Thursday, April 23, 2020
|1:15pm – 2:15pm
|Yuan Fang, Binghamton University
|Variational inference of logistic-normal multinomial mixture model for clustering microbiome data
Discrete data, for one example, the microbiome taxa count data resulting from 16S rRNA sequencing, is routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, overdispersed, and can only reveal relative abundance, therefore treated compositional. Analyzing compositional data presents many challenges because they are restricted on a simplex. Through logistic-normal multinomial model, one can map the relative abundance from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While this approach based on latent variables brings in flexibility in modeling the data, it comes with heavy computational cost when implemented relying on Bayesian techniques. In this paper, we extend the logistic-normal multinomial transformation in a finite mixture model framework to cluster microbiome data. Variational EM based on the variational Gaussian approximations (VGA), where complex posterior distributions are approximated using computationally convenient Gaussian densities by minimizing the Kullback-Leibler (KL) divergence between the true and the approximating densities of the posterior of the latent variables, were utilized for parameter estimation. Adopting a variational Gaussian approximation delivers accurate approximations of the complex posterior while reducing computational overhead substantially. The proposed method provides favorable clustering results and accurate parameter recovery, which are illustrated using simulation and real data.