This paper proposed an extension of word2vec by adding document and topic vectors inspired by LDA (Latent Dirichlet Allocation). The model can discover a linear relationship between words. For example, “California + technology = Silicon Valley”. The topic is interpretable by collecting all nearby word vectors to the selected topic vector. This work boosts word2vec with topic modeling via training in the similar fashion to word2vec.

The key difference of LDA2Vec is its loss function. There are 2 loss functions: the first one is Skipgram Negative Sampling Loss which is similar to Word2Vec. It wants to maximize the probability of predicting a target word and non-target word (negative samples) given a context vector . This loss wants the model to distinguish a positive word (which related to the given context) from negative sampled words.

The innovation is the context vector . The intuition is that predict a nearby word given a pivot word also depends on the theme of the context. For example, if the document is about an airline when we want to predict nearby words given a word “Germany”, we will likely want to see word related to airlines but not country names. Thus, a context vector is a sum of a word vector and document vector. . Thus, LDA2vec attempts to capture both document-wide relationship and local interaction between words within its context window.

In order to learn a topic vector, the document is further decomposed as a linear combination of topic vectors. where is a probability of document j will be a topic k. Finally, the interpretability comes from a sparsity of topic assignment vector, . One way to enforce sparsity is to design the loss function as:

When , we encourage a topic assignment probability to put more mass on a small set of topics.

The results look interesting. This paper shows a simple way to combine topic modeling with word embedding. By embedding document vectors and topic vectors into the same semantic space as word vectors, we can learn a global semantic structure as well as word-level local interaction.

References:

[1] https://arxiv.org/pdf/1605.02019.pdf

[2] http://multithreaded.stitchfix.com/blog/2016/05/27/lda2vec/

[3] Skip-gram tutorial part1: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

[4] Skip-gram tutorial part2: http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/