Learning Deep Structured Semantic Models for Web Search using Clickthrough Data (CIKM’13)

When we compute the relevant score between a query and documents in the corpus, we want the higher score when the given query is relevant to the document. The vector-space model based on word matching may fail sometimes when both query and document do not share any common word. Hence, the latent semantic model could solve the vocabulary gap problem by giving a higher probability for a word that is semantically similar to words appears in the document.

Since this paper was published in 2013, the latent semantic model actually does not perform well compared to a simple heuristic vector-space model such as BM25. The actual word matching remains the most important indicator of relevancy.

Anyhow, this paper demonstrates that they can use a deep neural network to embed both query and documents to a semantic space. They assume that the relevant query and documents should be nearby in a semantic space. Although this assumption is valid, this model does not emphasize on true matching, which is very valuable information in my opinion.

The deep model architecture containing 3 main parts: word hashing layer, mapping layer which maps a document to semantic space, and relevance measurement layer which computes a cosine similarity between the given query and document. Then, the relevant score is computed through the softmax layer, normalized all cosine similarity.

The performance gain comes from the fact that the authors use supervised information (click-through data) to train the model. By maximizing the conditional probability of \log \prod_{(Q,D^+)} P(D^+|Q), the model will learn to map a relevant pair of query and documents to similar embedded vectors.

The word hashing layer is introduced in order to reduce the dimension of an input vector. If we use one-hot vector, the length of a vector is too long. Using a letter-trigram reduces the dimension significantly.

The obvious extension is to replace the transformation layer with CNNs [2] or RNNs so that the deep model can capture local structures in the documents. [2] shows that using a convolutional layer with max pooling will slightly improve the NDCG score. I don’t know if anyone has tried RNN yet.

But what I don’t feel comfortable with their model is that they treat a query as a document. I think this assumption is too rigid. To me, a query is more like a signal indicates a user information need, which is clearly not a document. I still prefer to treat a query based on a language model rather than a vector space model.


[1] Huang, P., He, X., Gao, J., Deng, L., Acero, A., and Heck, L. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM’13.

[2] Shen, Yelong, et al. “Learning semantic representations using convolutional neural networks for web search.” WWW’14