Differences

This shows you the differences between two versions of the page.

--- seminars:stat:181108 [2018/10/31 18:14]
qyu created
+++ seminars:stat:181108 [2018/10/31 18:15] (current)
qyu
@@ Line 1: / Line 1: @@
+<WRAP centeralign>##Statistics Seminar##\\ Department of Mathematical Sciences</WRAP>
+<WRAP 70% center>
+^  **DATE:**|Thursday, Month 31, 2017 |
+^  **TIME:**|1:15pm -- 2:15pm |
+^  **LOCATION:**|WH 100E |
+^  **SPEAKER:**|Fan Yang, Binghamton  University |
+^  **TITLE:**|Visualizing Topics with Multi-Word Expressions  |
+</WRAP>
+\\
+<WRAP center box 80%>
+<WRAP centeralign>**Abstract**</WRAP>
+We describe a new method for visualizing topics, the distributions over
+terms that are automatically extracted from large text corpora using latent variable
+models. Our method finds significant n -grams related to a topic, which are then
+used to help understand and interpret the underlying distribution. Compared with
+the usual visualization, which simply lists the most probable topical terms, the
+multi-word expressions provide a better intuitive impression for what a topic is
+“about.” Our approach is based on a language model of arbitrary length expressions,
+for which we develop a new methodology based on nested permutation tests to find
+significant phrases. We show that this method outperforms the more standard use of
+chi-square and likelihood ratio tests. We illustrate the topic presentations on
+corpora of scientific abstracts and news articles.
+</WRAP>