Fiveable

๐Ÿค–Statistical Prediction Unit 14 Review

QR code for Statistical Prediction practice questions

14.3 Evaluation Metrics for Unsupervised Learning and Clustering

๐Ÿค–Statistical Prediction
Unit 14 Review

14.3 Evaluation Metrics for Unsupervised Learning and Clustering

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿค–Statistical Prediction
Unit & Topic Study Guides

Unsupervised learning evaluation metrics help us judge how well our clustering algorithms are performing without labeled data. These metrics fall into two categories: internal validation, which uses the data itself, and external validation, which compares results to known information.

Internal metrics like silhouette score and inertia measure cluster compactness and separation. External metrics like adjusted Rand index compare clustering results to known groupings. Understanding these metrics is crucial for selecting the best clustering approach for your data.

Internal Validation Metrics

Silhouette Score and Calinski-Harabasz Index

  • Silhouette Score measures how well an observation fits into its assigned cluster compared to other clusters
    • Calculates the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample
    • Silhouette coefficient for a sample is bโˆ’amax(a,b)\frac{b - a}{max(a, b)}
    • Ranges from -1 to 1, where a high value indicates that the object is well matched to its cluster and poorly matched to neighboring clusters
  • Calinski-Harabasz Index, also known as the Variance Ratio Criterion, evaluates the cluster validity based on the average between-cluster and within-cluster sum of squares
    • Defined as SSb/(kโˆ’1)SSw/(nโˆ’k)\frac{SS_b / (k-1)}{SS_w / (n-k)} where SSbSS_b is the between-cluster sum of squares, SSwSS_w is the within-cluster sum of squares, kk is the number of clusters, and nn is the total number of observations
    • A higher Calinski-Harabasz score relates to a model with better defined clusters

Davies-Bouldin Index and Dunn Index

  • Davies-Bouldin Index measures the average similarity between each cluster and its most similar cluster
    • Calculates the ratio of within-cluster distances to between-cluster distances for each cluster pair
    • A lower Davies-Bouldin Index indicates better separation between the clusters and more compact clusters
    • Aims to minimize the average similarity between each cluster and its most similar cluster
  • Dunn Index assesses the compactness and separation of clusters
    • Defined as the ratio between the minimal inter-cluster distance to the maximal intra-cluster distance
    • A higher Dunn Index implies better clustering, as it indicates that the clusters are compact and well-separated
    • Sensitive to outliers as it only considers the maximum intra-cluster distance and minimum inter-cluster distance

Inertia

  • Inertia, or within-cluster sum-of-squares (WSS), measures the compactness of the clustering
    • Calculated as the sum of squared distances of samples to their closest cluster center
    • A lower inertia indicates more compact clusters
    • Often used in combination with other metrics (silhouette score) to determine the optimal number of clusters
    • Inertia decreases monotonically as the number of clusters increases, so it alone cannot determine the optimal number of clusters

External Validation Metrics

Adjusted Rand Index and Mutual Information Score

  • Adjusted Rand Index (ARI) measures the similarity between two clusterings, adjusting for chance groupings
    • Calculates the number of pairs of elements that are either in the same group or in different groups in both clusterings
    • Ranges from -1 to 1, where 1 indicates perfect agreement between the clusterings, 0 represents the expected score of random labelings, and negative values indicate less agreement than expected by chance
    • Adjusts the Rand Index to account for the expected similarity of random clusterings
  • Mutual Information Score quantifies the amount of information shared between two clusterings
    • Measures how much knowing one clustering reduces the uncertainty about the other
    • Ranges from 0 to min(H(U), H(V)), where U and V are the two clusterings and H(.) is the entropy
    • A higher Mutual Information Score suggests a higher agreement between the clusterings
    • Can be normalized to adjust for the number of clusters and samples

Cophenetic Correlation Coefficient

  • Cophenetic Correlation Coefficient measures how faithfully a hierarchical clustering preserves the pairwise distances between the original data points
    • Compares the distances between samples in the original space to the distances between samples in the hierarchical clustering
    • Calculated as the Pearson correlation between the original distances and the cophenetic distances
    • Ranges from -1 to 1, where a value closer to 1 indicates that the hierarchical clustering accurately preserves the original distances
    • Helps to assess the quality of a hierarchical clustering and to compare different linkage methods (single, complete, average)