DSWoK — Data Science Well of Knowledge
Search
Search
Dark mode
Light mode
Explorer
#evaluation
14 notes
· co-occurs with
7 tags
· last updated
May 18, 2026
Co-tags
#
concept
8
#
clustering
1
#
unsupervised
1
#
cv
1
#
nlp
1
#
recsys
1
#
tabular-ml
1
Notes tagged
#evaluation
01
AB Tests
A/B testing (online controlled experimentation) randomly assigns units to a treatment or a control variant and compares aggregate outcomes, attributing the observed difference to the change.
May 18, 2026
General ML
02
Bias-Variance Trade-off
The bias-variance trade-off is a fundamental concept in machine learning that describes the balance between a model’s ability to fit the training data (low bias) and its ability to generalize to new, unseen data (low variance).
May 18, 2026
General ML
03
CUPED
CUPED makes A/B tests detect smaller effects without needing more users.
May 18, 2026
General ML
04
Time-series validation
This is one of the types of Validation, which deserves a special explanation due to the sheer variability and complexity.
May 18, 2026
General ML
05
Training-serving skew
Training-serving skew is a mismatch between the data representation used to train and evaluate a model and the representation available at serving time.
May 18, 2026
General ML
06
Validation
Model validation is the process of assessing how well a trained machine learning model performs on unseen data.
May 18, 2026
General ML
07
Calibration
A classifier is calibrated if its predicted probabilities match observed frequencies: among examples assigned a 0.7 score, roughly 70% should be positive.
May 18, 2026
Metrics and losses
08
Clustering metrics
Clustering metrics are quantitative measures used to evaluate the performance and quality of clustering algorithms.
May 18, 2026
Metrics and losses
09
Computer vision metrics
Computer vision metrics evaluate models across detection, segmentation, generation, 3D, and pose estimation tasks.
May 18, 2026
Metrics and losses
10
Confusion matrix
A Confusion Matrix is a table used to evaluate the performance of a classification model on a set of data for which the true values are known.
May 18, 2026
Metrics and losses
11
NLP metrics
NLP metrics evaluate tasks from machine translation and summarization to question answering and sequence labeling.
May 18, 2026
Metrics and losses
12
Recommendation system metrics
Recommendation system metrics are quantitative measures used to evaluate the performance and effectiveness of recommendation algorithms.
May 18, 2026
Metrics and losses
13
Regression metrics
Regression metrics are quantitative measures used to evaluate the performance of regression models, which predict continuous values rather than discrete classes.
May 18, 2026
Metrics and losses
14
f1 score
The F1 score is an evaluation metric for classification models that combines precision and recall into a single value, providing a balanced assessment of model performance.
May 18, 2026
Metrics and losses