Benchmarks#
The repository provides several benchmarking scripts under the
benchmark/
directory illustrating different aspects of the
Equilibrium K-Means implementation.
Available Scripts#
benchmark.py
Monte Carlo comparison of KMeans vs EKMeans on a highly imbalanced low-dimensional Gaussian mixture. Reports ARI and Silhouette distributions and shows final clustering results.
benchmark_alphaSweep.py
Sensitivity analysis scanning the
scale
parameter used in thealpha='dvariance'
heuristic. Plots ARI and Silhouette versus scale alongside KMeans baselines.benchmark_minibatch_compare.py
Contrasts full-batch EKMeans with two mini-batch regimes: cumulative (accumulation) and online (exponential moving average) updates. Reports timing, ARI, NMI, internal objective estimate, cluster size distribution and effective epochs/iterations.
benchmark_dirichlet_highdim.py
High-dimensional Dirichlet mixture benchmark generating imbalanced clusters with a controllable imbalance factor. Produces ARI, NMI, optional Silhouette (subsampled), SSE and timing statistics plus optional boxplots and 2D PCA projections.
benchmark_numba_ekm.py
Measures wall-clock speed of EKMeans with and without numba JIT acceleration on a synthetic (optionally imbalanced) dataset and reports mean/std speed and approximate speedup factor.
Running Benchmarks#
Install optional speed extras if you want numba acceleration benchmarking:
pip install -e .[speed]
Then run any script, for example:
python benchmark/benchmark_alphaSweep.py
For reproducibility each script exposes its own random seed handling or uses fixed seeds within loops.