Clustering metrics are quantitative measures used to evaluate the performance and quality of clustering algorithms.
Internal Metrics/Intrinsic (No Ground Truth Required)
These metrics evaluate clustering performance based only on the data and the clustering result, without requiring true labels:
- Silhouette Coefficient: Measures how similar an object is to its own cluster compared to other clusters. Values range from -1 to 1: +1 - good separation and cohesion, 0 - sample is close to the decision boundary between clusters, -1 - bad separation (too close to the points in a neighboring cluster).
Where:
- is the mean distance between point and all other points in the same cluster
- is the mean distance between point and all points in the nearest neighboring cluster
The Silhouette score for a clustering is the mean Silhouette coefficient over all samples:
- Davies-Bouldin Index: Measures the average similarity between clusters, where similarity is defined as the ratio between within-cluster distances and between-cluster distances. Lower values indicate better clustering (0 is minimum).
Where:
- is the number of clusters
- is the average distance of all points in cluster to their cluster centroid
- is the distance between centroids of clusters and
- Calinski-Harabasz Index (Variance Ratio Criterion): Ratio of between-cluster variance to within-cluster variance. Higher values indicate better-defined clusters.
Where:
- is the between-cluster dispersion matrix
- is the within-cluster dispersion matrix
- is the trace of a matrix
- is the total number of samples
- is the number of clusters
- Dunn Index: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. Higher values indicate better clustering.
Where:
- is the distance between clusters and
- is the diameter of cluster (maximum distance between any two points in the cluster)
- Inertia (Within-cluster Sum of Squares): Sum of squared distances of samples to their closest cluster center. Lower values indicate more compact clusters.
Where:
- is the data point
- is the center of cluster
- is the set of all cluster centers
External Metrics (Ground Truth Required)
These metrics compare clustering results against known true labels:
- Adjusted Rand Index (ARI): Measures the similarity between two clusterings, adjusted for chance. Values range from -1 to 1, with 1 indicating perfect agreement, 0 indicating random assignment, and negative values indicating assignments worse than random.
Where:
- is the number of objects that are in both class and cluster
- is the number of objects in class
- is the number of objects in cluster
- is the total number of objects
- Normalized Mutual Information (NMI): Measures the mutual information between the clustering assignment and the ground truth, normalized by the average entropy of both. Values range from 0 to 1, with 1 indicating perfect agreement.
Where:
- is the mutual information between clusterings and
- and are the entropies of and respectively
- Fowlkes-Mallows Score: Geometric mean of precision and recall. Values range from 0 to 1, with 1 indicating perfect agreement.
Where TP, FP, and FN are derived from counting pairs of points that are:
- True Positive (TP): in the same cluster in both clusterings
- False Positive (FP): in the same cluster in the predicted clustering but not in the ground truth
- False Negative (FN): in different clusters in the predicted clustering but in the same cluster in the ground truth
- Homogeneity, Completeness, and V-measure:
- Homogeneity: Each cluster contains only members of a single class (values from 0 to 1)
- Completeness: All members of a given class are assigned to the same cluster (values from 0 to 1)
- V-measure: Harmonic mean of homogeneity and completeness (values from 0 to 1)
- Contingency Matrix: A table showing the distribution of data points across predicted clusters and true classes. Not a metric itself, but the foundation for many external metrics.
Determining Optimal Number of Clusters
- Elbow Method: Plot inertia against number of clusters and look for the “elbow” point
- Silhouette Method: Choose the number of clusters that maximizes the Silhouette score
- Gap Statistic: Compare within-cluster dispersion to that expected under a null reference distribution