Clustering metrics are quantitative measures used to evaluate the performance and quality of clustering algorithms.

Internal Metrics/Intrinsic (No Ground Truth Required)

These metrics evaluate clustering performance based only on the data and the clustering result, without requiring true labels:

  1. Silhouette Coefficient: Measures how similar an object is to its own cluster compared to other clusters. Values range from -1 to 1: +1 - good separation and cohesion, 0 - sample is close to the decision boundary between clusters, -1 - bad separation (too close to the points in a neighboring cluster).

Where:

  • is the mean distance between point and all other points in the same cluster
  • is the mean distance between point and all points in the nearest neighboring cluster

The Silhouette score for a clustering is the mean Silhouette coefficient over all samples:

  1. Davies-Bouldin Index: Measures the average similarity between clusters, where similarity is defined as the ratio between within-cluster distances and between-cluster distances. Lower values indicate better clustering (0 is minimum).

Where:

  • is the number of clusters
  • is the average distance of all points in cluster to their cluster centroid
  • is the distance between centroids of clusters and
  1. Calinski-Harabasz Index (Variance Ratio Criterion): Ratio of between-cluster variance to within-cluster variance. Higher values indicate better-defined clusters.

Where:

  • is the between-cluster dispersion matrix
  • is the within-cluster dispersion matrix
  • is the trace of a matrix
  • is the total number of samples
  • is the number of clusters
  1. Dunn Index: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. Higher values indicate better clustering.

Where:

  • is the distance between clusters and
  • is the diameter of cluster (maximum distance between any two points in the cluster)
  1. Inertia (Within-cluster Sum of Squares): Sum of squared distances of samples to their closest cluster center. Lower values indicate more compact clusters.

Where:

  • is the data point
  • is the center of cluster
  • is the set of all cluster centers

External Metrics (Ground Truth Required)

These metrics compare clustering results against known true labels:

  1. Adjusted Rand Index (ARI): Measures the similarity between two clusterings, adjusted for chance. Values range from -1 to 1, with 1 indicating perfect agreement, 0 indicating random assignment, and negative values indicating assignments worse than random.

Where:

  • is the number of objects that are in both class and cluster
  • is the number of objects in class
  • is the number of objects in cluster
  • is the total number of objects
  1. Normalized Mutual Information (NMI): Measures the mutual information between the clustering assignment and the ground truth, normalized by the average entropy of both. Values range from 0 to 1, with 1 indicating perfect agreement.

Where:

  • is the mutual information between clusterings and
  • and are the entropies of and respectively
  1. Fowlkes-Mallows Score: Geometric mean of precision and recall. Values range from 0 to 1, with 1 indicating perfect agreement.

Where TP, FP, and FN are derived from counting pairs of points that are:

  • True Positive (TP): in the same cluster in both clusterings
  • False Positive (FP): in the same cluster in the predicted clustering but not in the ground truth
  • False Negative (FN): in different clusters in the predicted clustering but in the same cluster in the ground truth
  1. Homogeneity, Completeness, and V-measure:
    • Homogeneity: Each cluster contains only members of a single class (values from 0 to 1)
    • Completeness: All members of a given class are assigned to the same cluster (values from 0 to 1)
    • V-measure: Harmonic mean of homogeneity and completeness (values from 0 to 1)
  1. Contingency Matrix: A table showing the distribution of data points across predicted clusters and true classes. Not a metric itself, but the foundation for many external metrics.

Determining Optimal Number of Clusters

  • Elbow Method: Plot inertia against number of clusters and look for the “elbow” point
  • Silhouette Method: Choose the number of clusters that maximizes the Silhouette score
  • Gap Statistic: Compare within-cluster dispersion to that expected under a null reference distribution