Here is my quick survey on recent works of using deep learning to tackle collaborative filtering and cold start problems in recommender systems.
Pure Collaborative Filtering problem (No side-information)
I have read a few works that attempt to use deep learning to solve CF problems. In the Collaborative filtering problem, we want to infer latent representation of users and items from rating matrix. The best method that I have known is CF-NADE  and AutoRec . The first model is based on Auto-Regressive model (which is an extension of RBF ) while the second model is based on Autoencoder. I encounter RNN-based CF where we learn to predict the rating based on the seen rating. It seems to work well too. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems (CDAE)  is a denoise autoencoder to encode both item and user latent vectors. The user latent is used as a bias term.
The traditional method that performs well in this domain is SVD++ which is a matrix factorization that incorporating neighbor users implicitly. Probabilistic Matrix Factorization is bayesian approach to this problem and provides a better interpretation of the latent vectors.
I have not found any work that provides a new insight in CF problems except CF-NADE that uses completely different approach. AutoRec seems to work so well but the original paper is a short paper ( 2 pages) so I am not sure if it really works.
Content-based recommendation (with side-information)
A few works demonstrate that Autoencoder can be used to learn latent vectors from user or item contents. Collaborative deep learning (CDL)  uses stacked denoise autoencoder to learn item latent vectors and coupled with user latent vectors. Some works in Music recommendation simply uses deep learning to learn a better item latent vectors from music raw features [5, 6].
Deep Collaborative Filtering via Marginalized Denoising Auto-encoder  jointly learns user and item latent vectors from user and item contents. This model regularizes the latent vectors by adding reconstruction errors on rating matrix and try to reconstruct a user and item contents by projecting latent vectors back to the feature space. This additional loss terms could be a novel. However, this model put less weight on content reconstruction errors, hence, these extra hyper parameters add extra complexity to this model.
One question we need to ask if autoencoder is really a good approach to solve Collaborative filtering. One view of the autoencoder is a non-linear PCA. Can we elaborate this model?
 Deep content-based music recommendation
 Improving content-based and hybrid music recommendation using deep learning