Benchmarks#
The repository provides several benchmarking scripts under the
benchmark/ directory illustrating different aspects of the
Equilibrium K-Means implementation.
Available Scripts#
benchmark.pyMonte Carlo comparison of KMeans vs EKMeans on a highly imbalanced low-dimensional Gaussian mixture. Reports ARI and Silhouette distributions and shows final clustering results.
benchmark_alphaSweep.pySensitivity analysis scanning the
scaleparameter used in thealpha='dvariance'heuristic. Plots ARI and Silhouette versus scale alongside KMeans baselines.benchmark_minibatch_compare.pyContrasts full-batch EKMeans with two mini-batch regimes: cumulative (accumulation) and online (exponential moving average) updates. Reports timing, ARI, NMI, internal objective estimate, cluster size distribution and effective epochs/iterations.
benchmark_dirichlet_highdim.pyHigh-dimensional Dirichlet mixture benchmark generating imbalanced clusters with a controllable imbalance factor. Produces ARI, NMI, optional Silhouette (subsampled), SSE and timing statistics plus optional boxplots and 2D PCA projections.
benchmark_numba_ekm.pyMeasures wall-clock speed of EKMeans with and without numba JIT acceleration on a synthetic (optionally imbalanced) dataset and reports mean/std speed and approximate speedup factor.
Running Benchmarks#
Install optional speed extras if you want numba acceleration benchmarking:
pip install -e .[speed]
Then run any script, for example:
python benchmark/benchmark_alphaSweep.py
For reproducibility each script exposes its own random seed handling or uses fixed seeds within loops.