Interdisciplinary Dean's Speaker Series in Data Science: Arthur Spirling, NYU

Day Tuesday, November 19
Time 10:00 AM to 11:30 AM
Where UUW-325

Arthur Spirling, professor of politics and data science at New York University, will speak on "Word Embeddings: What works, what doesn't, and how to tell the difference for applied research" from 10-11:30 a.m. Tuesday, Nov. 19, in UUW-325.

We consider the properties and performance of word embeddings techniques in the context of political science research. In particular, we explore key parameter choices—including context window length, embedding vector dimensions and the use of pre-trained vs locally fit variants — with respect to efficiency and quality of inferences possible with these models. Reassuringly we show that results are generally robust to such choices for political corpora of various sizes and in various languages. Beyond reporting extensive technical findings, we provide a novel crowdsourced “Turing test”-style method for examining the relative performance of any two models that produce substantive, text-based outputs. Encouragingly, we show that popular, easily available pre-trained embeddings perform at a level close to - or surpassing - both human coders and more complicated locally-fit models. For completeness, we provide best practice advice for cases where local fitting is required.

RSVP at http://bit.ly/DS-TAE-RSVP.

The paper Spirling will present can be found at https://github.com/ArthurSpirling/EmbeddingsPaper/blob/master/Paper/Embeddings_SpirlingRodriguez.pdf

Here is a FAQ that he suggested could be useful:

https://github.com/ArthurSpirling/EmbeddingsPaper/blob/master/Project_FAQ/faq.md

For questions, contact David Clark at dclark@binghamton.edu or Xingye Qiao at qiao@math.binghamton.edu.


Add to Calendar 19/11/19 10:00 AM 19/11/19 11:30 AM 15 Interdisciplinary Dean's Speaker Series in Data Science: Arthur Spirling, NYU <h3><span style="font-size: 14px;">Arthur Spirling, professor of politics and data science at New York University, will speak on "Word Embeddings: What works, what doesn't, and how to tell the difference for applied research" from 10-11:30 a.m. Tuesday, Nov. 19, in UUW-325.</span><br></h3><h3><p><span>We consider the properties and performance of word embeddings techniques in the context of political science research. In particular, we explore key parameter choices—including context window length, embedding vector dimensions and the use of pre-trained vs locally fit variants — with respect to efficiency and quality of inferences possible with these models. Reassuringly we show that results are generally robust to such choices for political corpora of various sizes and in various languages. Beyond reporting extensive technical findings, we provide a novel crowdsourced “Turing test”-style method for examining the relative performance of an UUW-325 DD/MM/YY