Neural collaborative filtering (NCF), introduced by He et al. in 2017, generalises matrix factorisation by replacing its fixed inner-product scoring with a learned non-linear function. The argument is that $\hat r_{ui} = q_i^\top p_u$ is a bilinear function of embeddings and so cannot capture interactions that violate the triangle inequality in embedding space; an MLP can.
The basic NCF model embeds the user and item one-hot IDs into vectors $p_u, q_i \in \mathbb{R}^k$, concatenates them, and passes the result through an MLP terminating in a scalar:
$$\hat r_{ui} = \sigma\!\left(h^\top \phi_L(\cdots \phi_1([p_u\,\|\,q_i]))\right)$$
where each $\phi_l(z) = \mathrm{ReLU}(W_l z + b_l)$. With binary implicit feedback, the loss is binary cross-entropy:
$$\mathcal{L} = -\sum_{(u,i) \in \mathcal{K}^+ \cup \mathcal{K}^-} y_{ui} \log \hat r_{ui} + (1 - y_{ui}) \log(1 - \hat r_{ui})$$
where positives $\mathcal{K}^+$ are observed interactions and negatives $\mathcal{K}^-$ are sampled unobserved pairs.
He et al.'s full model, NeuMF, combines two parallel branches:
- Generalised matrix factorisation (GMF): an element-wise product $p_u^G \odot q_i^G$ with a learned scalar weight per dimension, recovering the inner product when those weights are uniform.
- MLP: the concatenation-and-MLP branch above.
Their final layers are concatenated and fed through a single linear layer to produce $\hat r_{ui}$. The two branches use separate embedding tables $p_u^G \neq p_u^M$, so each branch can specialise: GMF captures linear similarity, MLP captures non-linear interactions.
NCF was a milestone because it demonstrated that deep learning could beat tuned matrix-factorisation baselines on the MovieLens and Pinterest benchmarks, and it offered an architectural template for blending classical recommender ideas with neural networks. Subsequent work, however, was more nuanced. Rendle et al.'s 2020 paper "Neural Collaborative Filtering vs. Matrix Factorization Revisited" showed that with careful hyperparameter tuning, plain matrix factorisation matched or beat NeuMF on the original benchmarks, and an even simpler dot product matched the MLP. The honest reading is that non-linearity helps when it captures real structure, such as content features, side information, or sequential context, but a deep MLP on top of two ID embeddings adds little beyond a tuned dot product.
NCF therefore lives on less as a specific architecture than as the conceptual bridge between classical collaborative filtering and the deep-learning era. Its descendants, two-tower retrievers, sequential transformers like SASRec and BERT4Rec, and graph-based models like LightGCN, all combine neural encoders with embedding-based scoring, and all owe a clear debt to NCF for showing that the recommender stack and the deep-learning stack belong together.
Related terms: Matrix Factorisation, Two-Tower Recommender, Sequential Recommendation
Discussed in:
- Chapter 11: CNNs, Recommender Systems