Wide & Deep Learning is a joint model architecture developed by Google for recommendation systems and search ranking. It combines a linear (wide) model with a deep neural network to achieve both memorization and generalization. - Memorization: Learning the frequent co-occurrence of items or features to capture direct, explicit relationships in historical data - Generalization: Transferring learned patterns to new item combinations and exploring new feature combinations ## Architecture * Wide component: A linear model that uses features, one-hot encoded categorical features and feature combinations (concatenating values of categorical features) * Deep component: A feed-forward neural network that takes dense embeddings or numerical features as input. Often consists of multiple fully connected layers with activation functions (ReLU). * Joint Training: The outputs from the wide component and the deep component are combined with a weighted sum into a final prediction. - Trained end-to-end with a single loss function (logistic loss for classification, regression for numeric predictions). ![[Pasted image 20250301171413.png]] ### Advantages - Combines memorization abilities of linear models with generalization capabilities of deep networks - Better handles both sparse and dense features - Scalable - Flexible feature engineering ### Disadvantages - Still requires some manual feature engineering for the wide component - Training can be computationally intensive, may overfit ### Possible improvements - DeepFM: Combines Factorization Machines with deep networks, replacing the wide component - Deep & Cross Network: Uses explicit feature crossing layers instead of manual feature engineering ## Links - [Original Paper](https://arxiv.org/abs/1606.07792) - [TensorFlow Implementation](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/WideAndDeep) - [Google AI Blog Post](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html)