Memory Network

Today our colleague presented Memory Network [1] at our research seminar. I want to summarize my understanding on this model and compared it with other deep sequential models such as LSTM and RNNs.

In Question-Answering problem, we are given a question to a learning model, and the model will output an answer based on information and facts that it was given during the training. LSTM and Sequence-To-Sequence models are popular choices of models due to they can handle a variable length input which is common in text and these models avoid vanish gradient problem.

The Memory Network extends these earlier models by adding supporting facts as additional inputs. Intuitively, this should help the model to answer the question better because of the extra information. The model will first locate a few good facts (which is 2 sentences in this paper ) and use that as additional inputs.

Locating relevant facts is time-consuming. Thus, the authors proposed a few approximation to speeding up. The first approximation is to only search for a sentence that sharing the same word with the question. This is the fastest method but not accurate. The second approximation is to cluster all word hash and only search for sentences that have at least one word sharing the same cluster as the given question. This method is much more accurate but it is also slower.

There are a few more extensions such as adding time relation among sentences in order to answer more tricky questions.

In summing up, the memory network does not utilize all facts but instead picking a few relevant facts. The process of fact picking is inefficient. Instead, if we assign a weight to each fact, we might be able to utilize more information to answer the given question.  The following up works have exploited this idea and demonstrated better results in QA problem.


[1]¬†Weston, Jason, Sumit Chopra, and Antoine Bordes. “Memory networks.” arXiv preprint arXiv:1410.3916 (2014).