Data Science Seminar
Hosted by Department of Mathematical Sciences
RSVP at http://bit.ly/DS-TAE-RSVP.
We consider the properties and performance of word embeddings techniques in the context of political science research. In particular, we explore key parameter choices—including context window length, embedding vector dimensions and the use of pre-trained vs locally fit variants — with respect to efficiency and quality of inferences possible with these models. Reassuringly we show that results are generally robust to such choices for political corpora of various sizes and in various languages. Beyond reporting extensive technical findings, we provide a novel crowdsourced “Turing test”-style method for examining the relative performance of any two models that produce substantive, text-based outputs. Encouragingly, we show that popular, easily available pre-trained embeddings perform at a level close to - or surpassing - both human coders and more complicated locally-fit models. For completeness, we provide best practice advice for cases where local fitting is required.
Bio: Spirling is Professor of Politics and Data Science at New York University. He is the Deputy Director and the Director of Graduate Studies (MSDS) at the Center for Data Science, and Chair of the Executive Committee of the Moore-Sloan Data Science Environment. Spirling specializes in political methodology and legislative behavior, with an interest in the application of text-as-data/NLP, Bayesian statistics, machine learning, item response theory and generalized linear models in political science. His substantive field is comparative politics, and he focuses primarily on the United Kingdom. Spirling received his PhD from the University of Rochester, Department of Political Science, in 2008. From 2008 to 2015, he was an Assistant Professor and then the John L. Loeb Associate Professor of the Social Sciences in the Department of Government at Harvard University. He is the faculty coordinator for the NYU Text-as-Data speaker series.
Here is the paper Arthur Spirling is going to present.
Here is a FAQ that he suggested could be useful.
The Interdisciplinary Dean's Speaker Series in Data Sciences is supported by the:
For questions, contact David Clark (firstname.lastname@example.org) or Xingye Qiao (email@example.com).