Clustering metrics are quantitative measures used to evaluate the performance and quality of clustering algorithms.

When to use which metric

MetricWhen to use
SilhouetteNo ground truth; how well each point fits its cluster vs the next nearest.
Davies-BouldinNo ground truth; lower = better separation.
Calinski-HarabaszNo ground truth; higher = better-defined clusters.
DunnNo ground truth; tight clusters that are far apart.
InertiaCompactness. Used in the elbow method for choosing .
Adjusted Rand Index (ARI)Ground truth available; similarity vs a random baseline.
Normalized Mutual Information (NMI)Ground truth available; mutual information normalized by entropy.
Fowlkes-MallowsGround truth available; geometric mean of precision and recall.
Homogeneity / Completeness / V-measureGround truth available; class purity, cluster purity, and their harmonic mean.

Internal Metrics — Intrinsic (no ground truth)

These metrics evaluate clustering performance based only on the data and the clustering result, without requiring true labels.

Silhouette Coefficient

Measures how similar an object is to its own cluster compared to other clusters. Values range from −1 to 1: +1 = good separation and cohesion, 0 = sample is close to the decision boundary between clusters, −1 = bad separation (too close to a neighboring cluster).

Where:

  • is the mean distance between point and all other points in the same cluster.
  • is the mean distance between point and all points in the nearest neighboring cluster.

The Silhouette score for a clustering is the mean Silhouette coefficient over all samples:

Davies-Bouldin Index

Measures the average similarity between clusters, where similarity is the ratio of within-cluster distances to between-cluster distances. Lower values indicate better clustering (0 is the minimum).

Where:

  • is the number of clusters.
  • is the average distance of points in cluster to its centroid.
  • is the distance between centroids of clusters and .

Calinski-Harabasz Index (Variance Ratio Criterion)

Ratio of between-cluster variance to within-cluster variance. Higher values indicate better-defined clusters.

Where:

  • is the between-cluster dispersion matrix.
  • is the within-cluster dispersion matrix.
  • is the trace of a matrix.
  • is the total number of samples.
  • is the number of clusters.

Dunn Index

Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. Higher values indicate better clustering.

Where:

  • is the distance between clusters and .
  • is the diameter of cluster (maximum distance between any two points in the cluster).

Inertia (Within-cluster Sum of Squares)

Sum of squared distances of samples to their closest cluster center. Lower values indicate more compact clusters.

Where:

  • is the data point.
  • is the center of cluster .
  • is the set of all cluster centers.

External Metrics — ground truth required

These metrics compare clustering results against known true labels.

Adjusted Rand Index (ARI)

Measures the similarity between two clusterings, adjusted for chance. Values range from −1 to 1; 1 = perfect agreement, 0 = random assignment, negative = worse than random.

Where:

  • is the number of objects in both class and cluster .
  • is the number of objects in class .
  • is the number of objects in cluster .
  • is the total number of objects.

Normalized Mutual Information (NMI)

Measures the mutual information between the clustering assignment and the ground truth, normalized by the average entropy of both. Values range from 0 to 1; 1 = perfect agreement.

Where:

  • is the mutual information between clusterings and .
  • and are the entropies of and .

Fowlkes-Mallows Score

Geometric mean of precision and recall. Values range from 0 to 1; 1 = perfect agreement.

Where TP, FP, and FN are derived from counting pairs of points:

  • True Positive (TP) — in the same cluster in both clusterings.
  • False Positive (FP) — same cluster in the predicted clustering but not in the ground truth.
  • False Negative (FN) — different clusters in the predicted clustering but in the same cluster in the ground truth.

Homogeneity, Completeness, and V-measure

  • Homogeneity — each cluster contains only members of a single class (values 0 to 1).
  • Completeness — all members of a class are assigned to the same cluster (values 0 to 1).
  • V-measure — harmonic mean of homogeneity and completeness (values 0 to 1).

Contingency Matrix

A table showing the distribution of data points across predicted clusters and true classes. Not a metric itself, but the foundation for many external metrics.

Determining the Optimal Number of Clusters

  • Elbow Method — plot inertia against the number of clusters and look for the “elbow” point.
  • Silhouette Method — choose the number of clusters that maximizes the Silhouette score.
  • Gap Statistic — compare within-cluster dispersion to that expected under a null reference distribution.