Negative samples are very important in learning an effective collaborative filtering model. In an implicit feedback CF problem where we collect implicit data such as clicking or viewing by a user, those unclicked or non-viewed items can be either positive or negative sample. But when we train a CF model without carefully select negative samples, the model will be biased because we treat all missing data as if negative samples. Some unobserved positive samples would be negative samples as well.
This classic paper proposed two treatments on missing values. The first approach is to treat all missing values as negative samples. The key is to not penalize too much when the model mispredicts a negative sample. This approach introduces a weight parameter for each pair of user and item in the training data. The positive samples will have a weight of 1, but a negative sample will have a much smaller weight. This reflects our uncertainty on whether the given negative sample might be a positive sample.
The weight can be uniform for all missing data or can be user-based or item-based weighting. The user-based weighting assumes that once the current user has viewed a lot of items already, the chance of unobserved item to be a negative sample is very likely. The item-based weighting assumes that if a user has not viewed a popular item, it probably means that he/she does not like that item.
The above scheme is used to generate a richer dataset including negative samples and thus improve the MAP on the given CF model. However, since it trained by ALS, the computational cost is expensive. To alleviate this cost, sampling based scheme is utilized.
We can fix the number of samples to be somewhat small and sample negative items based on either uniform, user-based, or item-based assumptions. Then, we will train ALS to reconstruct an approximate rating matrix. Since the number of samples is much smaller, we need to approximate a lot of rating matrices. Finally, we average all approximate rating matrices to achieve the final predicted rating matrix.
The experimental results show that User-based assumption performs slightly better than uniform and item-based assumption. The reason why uniform assumption still perform as good as item-based because most missing or unlabelled items are negative samples anyway.
 Pan, Rong, et al. “One-class collaborative filtering.” Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 2008.