User Tools

Site Tools


seminars:stat:181108

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

seminars:stat:181108 [2018/10/31 18:14]
qyu created
seminars:stat:181108 [2018/10/31 18:15] (current)
qyu
Line 1: Line 1:
 +<WRAP centeralign>##​Statistics Seminar##\\ Department of Mathematical Sciences</​WRAP>​
 +
 +<WRAP 70% center>
 +^  **DATE:​**|Thursday,​ Month 31, 2017 |
 +^  **TIME:​**|1:​15pm -- 2:15pm |
 +^  **LOCATION:​**|WH 100E |
 +^  **SPEAKER:​**|Fan Yang, Binghamton ​ University |
 +^  **TITLE:​**|Visualizing Topics with Multi-Word Expressions ​ |
 +</​WRAP>​
 +\\ 
 +
 +<WRAP center box 80%>
 +<WRAP centeralign>​**Abstract**</​WRAP>​
 +We describe a new method for visualizing topics, the distributions over
 +terms that are automatically extracted from large text corpora using latent variable
 +models. Our method finds significant n -grams related to a topic, which are then
 +used to help understand and interpret the underlying distribution. Compared with
 +the usual visualization,​ which simply lists the most probable topical terms, the
 +multi-word expressions provide a better intuitive impression for what a topic is
 +“about.” Our approach is based on a language model of arbitrary length expressions,​
 +for which we develop a new methodology based on nested permutation tests to find
 +significant phrases. We show that this method outperforms the more standard use of
 +chi-square and likelihood ratio tests. We illustrate the topic presentations on
 +corpora of scientific abstracts and news articles.
 +</​WRAP>​
 +
 +