##Statistics Seminar##\\ Department of Mathematical Sciences
^ **DATE:**|Thursday, April 23, 2020 |
^ **TIME:**|1:15pm -- 2:15pm |
^ **LOCATION:**|zoom meeting |
^ **SPEAKER:**|Yuan Fang, Binghamton University |
^ **TITLE:**|Variational inference of logistic-normal multinomial mixture model for clustering microbiome data |
\\
**Abstract**
Discrete data, for one example, the microbiome taxa count data
resulting from 16S rRNA sequencing, is routinely encountered in
bioinformatics. Taxa count data in microbiome studies are typically
high-dimensional, overdispersed, and can only reveal relative abundance,
therefore treated compositional. Analyzing compositional data presents many
challenges because they are restricted on a simplex. Through
logistic-normal multinomial model, one can map the relative abundance from
a simplex to a latent variable that exists on the real Euclidean space
using the additive log-ratio transformation. While this approach based on
latent variables brings in flexibility in modeling the data, it comes with
heavy computational cost when implemented relying on Bayesian techniques.
In this paper, we extend the logistic-normal multinomial transformation in
a finite mixture model framework to cluster microbiome data. Variational EM
based on the variational Gaussian approximations (VGA), where complex
posterior distributions are approximated using computationally convenient
Gaussian densities by minimizing the Kullback-Leibler (KL) divergence
between the true and the approximating densities of the posterior of the
latent variables, were utilized for parameter estimation. Adopting a
variational Gaussian approximation delivers accurate approximations of the
complex posterior while reducing computational overhead substantially. The
proposed method provides favorable clustering results and accurate
parameter recovery, which are illustrated using simulation and real data.