76 - Increasing In-Class Similarity by Retrofitting Embeddings with Demographics, with Dirk Hovy




NLP Highlights show

Summary: EMNLP 2018 paper by Dirk Hovy and Tommaso Fornaciari. https://www.semanticscholar.org/paper/Improving-Author-Attribute-Prediction-by-Linguistic-Hovy-Fornaciari/71aad8919c864f73108aafd8e926d44e9df51615 In this episode, Dirk Hovy talks about natural language as social phenomenon which can provide insights about those who generate it. For example, this paper uses retrofitted embeddings to improve on two tasks: predicting the gender and age group of a person based on their online reviews. In this approach, authors embeddings are first generated using Doc2Vec, then retrofitted such that authors with similar attributes are closer in the vector space. In order to estimate the retrofitted vectors for authors with unknown attributes, a linear transformation is learned which maps Doc2Vec vectors to the retrofitted vectors. Dirk also used a similar approach to encode geographic information to model regional linguistic variations, in another EMNLP 2018 paper with Christoph Purschke titled “Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting” [link: https://www.semanticscholar.org/paper/Capturing-Regional-Variation-with-Distributed-Place-Hovy-Purschke/6d9babd835d0cdaaf175f098bb4fd61fd75b1be0].