Member-only story
Rand Index (RI) vs. Adjusted Rand Index (ARI) in K-Means Clustering
Rand Index (RI)
3 min readNov 23, 2024
The Rand Index is a measure of similarity between two clusterings. It calculates the percentage of correct decisions, comparing the predicted clusters to the true labels. RI takes into account:
- True Positives (TP): Pairs of points that are in the same cluster in both the predicted and true labels.
- True Negatives (TN): Pairs of points that are in different clusters in both the predicted and true labels.
The Rand Index is defined as:
RI= TP+TN/TP+TN+FP+FN
Where:
- FP: Pairs of points that are in the same cluster in the predicted labels but different clusters in the true labels.
- FN: Pairs of points that are in different clusters in the predicted labels but the same cluster in the true labels.
The Rand Index is between 0 and 1, where:
- 1 indicates perfect agreement between the two clusterings.
- 0 indicates no agreement.
Limitation: RI does not correct for random chance, so higher values may be inflated for clusterings that have a large number of clusters.
Adjusted Rand Index (ARI)
The Adjusted Rand Index (ARI) is a corrected-for-chance version of the Rand Index. It accounts for the fact that random cluster assignments can lead to…