Data Science Seminar
Hosted by Department of Mathematical Sciences
Discrete data such as the microbiome taxa count data resulting from 16S rRNA sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, overdispersed, and can only reveal relative abundance, therefore are treated as compositional. Analyzing these data presents challenges because they are restricted on a simplex. Additionally, these microbiome taxa counts are affected by other biological and/or environmental covariates such as age, gender, diet etc. Here, we develop regression-based mixtures of logistic normal multinomial models for clustering microbiome data. These models partitions samples into homogeneous subpopulations and allows for investigation of the relationship between bacterial abundance and biological and/or environmental covariates within each inferred group. In this project, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). The proposed method is illustrated on simulated datasets.