Neural Factorization Machines for Sparse Predictive Analytics (SIGIR17)

Deep learning has been applied to many information retrieval problems at SIGIR this year. One of the key ideas is to add non-linearity to existing linear models commonly found in IR.

Factorization machine is one of the linear models for a sparse data prediction. The model is composed of a linear regression terms and d-way interaction terms. Then, it is possible to add non-linear interaction to Factorization machine. Neural FM [1] uses a neural network to model a non-linear interaction between feature latent vectors and it demonstrates interesting results on recommendation tasks.

The model of Factorization machine is as follows:

\hat y(\mathbf{x} ) = w_0 + \sum_{i=1}^n w_ix_i + f(\mathbf{x})

This paper changes the interaction f(\mathbf{x}) to be a non-linear function. The author introduces a bi-interaction layer, which is a pooling operation that converts embedding feature vectors to a single vector. Then, the result vector will be fed to a multi-layers neural network and the output from this operation will be an interaction score f(\mathbf{x}).

NFM is different from Wide&Deep model due to its pooling operation. The Wide&Deep model concatenates features together which does not account for feature interactions.

To train NFM, the author uses a square loss and SGD for parameter estimation. Dropout and batch normalization are utilized to avoid an overfitting.

The baselines are strong such as DeepCross and Wide&Deep models. The performance (RMSE) on personalized tag recommendation shows that NFM yields a much lower RMSE than other state-of-the-arts models. In conclusion, NFM models a higher-order and non-linear feature interaction that use a few parameters and does not require a deep structure.


[1] Xiangnan He, Tat-Seng Chua, “Neural Factorization Machines for Sparse Predictive Analytics”, ACM SIGIR 2017.