Recommendation system metrics are quantitative measures used to evaluate the performance and effectiveness of recommendation algorithms. These metrics help assess how well a system can predict user preferences, rank items, and provide valuable recommendations.
- Precision@k: measures the proportion of relevant items among the top-k recommendations. Useful when the user sees only a few recommendations and we want those few to be highly relevant.
- Recall@k: measures the proportion of relevant items that are present in the top-k recommendations. Useful when we want to retrieve as many relevant items as possible from a large catalog.
- Hit Rate@k: measures the proportion of users for whom at least one relevant item appears in their top-k recommendations. Useful for understanding overall system effectiveness. Does not differentiate between one and multiple relevant recommendations.
- Mean Average Precision (MAP@k): calculates the mean of Average Precision (AP) across all users, where AP is the average of precision values at each relevant position in the ranked recommendations. Useful for evaluating ranked recommendations where the order matters.
Where:
- is the number of relevant items for the user
- is an indicator function (1 if the item at position is relevant, 0 otherwise)
- Normalized Discounted Cumulative Gain (NDCG@k): measures the quality of ranking by assigning higher weights to relevant items appearing higher in the list and normalizing by the ideal ranking. Penalizes relevant items appearing lower in the list.
- Mean Reciprocal Rank (MRR@k): measures the average of reciprocal ranks of the first relevant item across all users. Useful when the first good recommendation is most important (search engines).
Additional Metrics
- Diversity: measures how diverse the recommended items are across various dimensions. Helps prevent the “filter bubble” phenomenon.
Intra-List Diversity
The average pairwise dissimilarity between items in a recommendation list.
Where is the distance or dissimilarity between items and .
- Novelty: measures how unusual or unfamiliar the recommended items are to users. Helps users discover new content beyond popular items.
- Serendipity: measures how unexpected yet relevant the recommendations are. Aims to delight users with discoveries they wouldn’t have found on their own.
Where:
- is the unexpectedness of item (often calculated as dissimilarity from user’s profile)
- is the relevance of item
- Coverage
Item Coverage
The proportion of all available items that are recommended to at least one user. Helps prevent the “long-tail” problem where many items are never recommended
User Coverage
The proportion of users who receive at least one recommendation.
- Conversion Rate: the percentage of recommendations that lead to a desired action (e.g., click, purchase).
- Click-Through Rate (CTR): the ratio of clicks to impressions for recommended items.
- User Satisfaction: direct measurement of user satisfaction with recommendations, often collected through surveys or feedback mechanisms.