Clustering Algorithms — Interactive Comparison

Three classic unsupervised algorithms (K-Means, DBSCAN, HAC) compared on three benchmark 2D datasets where the "right" choice differs.

The point isn't that one algorithm is best — it's that picking the right tool for the geometry of your data matters more than any tuning. Try a non-default algorithm on the spiral dataset and watch K-Means fail.

Scope: these are synthetic 2D toy datasets chosen because they make geometric trade-offs visible at a glance. Real-world clustering rarely looks this clean — the metrics shown (silhouette, Davies-Bouldin, Calinski-Harabasz) are useful relative comparisons here, less so as absolute production benchmarks.

Full pipeline, tests, and notebooks on GitHub

Dataset

3 entwined spiral-shaped clusters (N=312). DBSCAN dominates here — distance-based methods get fooled by the geometry.

Algorithm
2 25