Benchmarks ========== The repository provides several benchmarking scripts under the ``benchmark/`` directory illustrating different aspects of the Equilibrium K-Means implementation. Available Scripts ----------------- ``benchmark.py`` Monte Carlo comparison of KMeans vs EKMeans on a highly imbalanced low-dimensional Gaussian mixture. Reports ARI and Silhouette distributions and shows final clustering results. ``benchmark_alphaSweep.py`` Sensitivity analysis scanning the ``scale`` parameter used in the ``alpha='dvariance'`` heuristic. Plots ARI and Silhouette versus scale alongside KMeans baselines. ``benchmark_minibatch_compare.py`` Contrasts full-batch EKMeans with two mini-batch regimes: cumulative (accumulation) and online (exponential moving average) updates. Reports timing, ARI, NMI, internal objective estimate, cluster size distribution and effective epochs/iterations. ``benchmark_dirichlet_highdim.py`` High-dimensional Dirichlet mixture benchmark generating imbalanced clusters with a controllable imbalance factor. Produces ARI, NMI, optional Silhouette (subsampled), SSE and timing statistics plus optional boxplots and 2D PCA projections. ``benchmark_numba_ekm.py`` Measures wall-clock speed of EKMeans with and without numba JIT acceleration on a synthetic (optionally imbalanced) dataset and reports mean/std speed and approximate speedup factor. Running Benchmarks ------------------ Install optional speed extras if you want numba acceleration benchmarking: .. code-block:: bash pip install -e .[speed] Then run any script, for example: .. code-block:: bash python benchmark/benchmark_alphaSweep.py For reproducibility each script exposes its own random seed handling or uses fixed seeds within loops.