DSWoK — Data Science Well of Knowledge

#evaluation

14 notes · co-occurs with 7 tags · last updated Jun 22, 2026

Co-tags#concept8 #clustering1 #unsupervised1 #cv1 #nlp1 #recsys1 #tabular-ml1

Notes tagged #evaluation

A/B testing (online controlled experimentation) randomly assigns units to a treatment or a control variant and compares aggregate outcomes, attributing the observed difference to the change.

Bias-Variance Trade-off

The bias-variance trade-off is a fundamental concept in machine learning that describes the balance between a model’s ability to fit the training data (low bias) and its ability to generalize to new, unseen data (low variance).

CUPED makes A/B tests detect smaller effects without needing more users.

Time-series validation

This is one of the types of Validation, which deserves a special explanation due to the sheer variability and complexity.

Training-serving skew

Training-serving skew is a mismatch between the data representation used to train and evaluate a model and the representation available at serving time.

Model validation is the process of assessing how well a trained machine learning model performs on unseen data.

A classifier is calibrated if its predicted probabilities match observed frequencies: among examples assigned a 0.7 score, roughly 70% should be positive.

Metrics and losses

Clustering metrics

Clustering metrics are quantitative measures used to evaluate the performance and quality of clustering algorithms.

Metrics and losses

Computer vision metrics

Computer vision metrics evaluate models across detection, segmentation, generation, 3D, and pose estimation tasks.

Metrics and losses

Confusion matrix

A Confusion Matrix is a table used to evaluate the performance of a classification model on a set of data for which the true values are known.

Metrics and losses

NLP metrics evaluate tasks from machine translation and summarization to question answering and sequence labeling.

Metrics and losses

Recommendation system metrics

Recommendation system metrics are quantitative measures used to evaluate the performance and effectiveness of recommendation algorithms.

Metrics and losses

Regression metrics

Regression metrics are quantitative measures used to evaluate the performance of regression models, which predict continuous values rather than discrete classes.

Metrics and losses

The F1 score is an evaluation metric for classification models that combines precision and recall into a single value, providing a balanced assessment of model performance.

Metrics and losses

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community