Deep learning has been applied to many information retrieval problems at SIGIR this year. One of the key ideas is to add non-linearity to existing linear models commonly found in IR.

Factorization machine is one of the linear models for a sparse data prediction. The model is composed of a linear regression terms and d-way interaction terms. Then, it is possible to add non-linear interaction to Factorization machine. Neural FM [1] uses a neural network to model a non-linear interaction between feature latent vectors and it demonstrates interesting results on recommendation tasks.

The model of Factorization machine is as follows:

This paper changes the interaction to be a non-linear function. The author introduces a bi-interaction layer, which is a pooling operation that converts embedding feature vectors to a single vector. Then, the result vector will be fed to a multi-layers neural network and the output from this operation will be an interaction score .

NFM is different from Wide&Deep model due to its pooling operation. The Wide&Deep model concatenates features together which does not account for feature interactions.

To train NFM, the author uses a square loss and SGD for parameter estimation. Dropout and batch normalization are utilized to avoid an overfitting.

The baselines are strong such as DeepCross and Wide&Deep models. The performance (RMSE) on personalized tag recommendation shows that NFM yields a much lower RMSE than other state-of-the-arts models. In conclusion, NFM models a higher-order and non-linear feature interaction that use a few parameters and does not require a deep structure.

**References:**

[1] Xiangnan He, Tat-Seng Chua, “Neural Factorization Machines for Sparse Predictive Analytics”, ACM SIGIR 2017.

http://www.comp.nus.edu.sg/~xiangnan/papers/sigir17-nfm.pdf