Wide & Deep Learning is a joint model architecture developed by Google for recommendation systems and search ranking. It combines a linear (wide) model with a deep neural network to achieve both memorization and generalization.
- Memorization: Learning the frequent co-occurrence of items or features to capture direct, explicit relationships in historical data
- Generalization: Transferring learned patterns to new item combinations and exploring new feature combinations
Architecture
- Wide component: A linear model that uses features, one-hot encoded categorical features and feature combinations (concatenating values of categorical features)
- Deep component: A feed-forward neural network that takes dense embeddings or numerical features as input. Often consists of multiple fully connected layers with activation functions (ReLU).
- Joint Training: The outputs from the wide component and the deep component are combined with a weighted sum into a final prediction.
- Trained end-to-end with a single loss function (logistic loss for classification, regression for numeric predictions).

Advantages
- Combines memorization abilities of linear models with generalization capabilities of deep networks
- Better handles both sparse and dense features
- Scalable
- Flexible feature engineering
Disadvantages
- Still requires some manual feature engineering for the wide component
- Training can be computationally intensive, may overfit
Possible improvements
- DeepFM: Combines Factorization Machines with deep networks, replacing the wide component
- Deep & Cross Network: Uses explicit feature crossing layers instead of manual feature engineering