sklekmeans.MiniBatchSSEKM#

class sklekmeans.MiniBatchSSEKM(n_clusters=8, *, metric='euclidean', alpha='dvariance', scale=2.0, theta='auto', batch_size=256, max_epochs=10, n_init=1, init='k-means++', init_size=None, shuffle=True, learning_rate=None, tol=0.0001, reassignment_ratio=0.0, reassign_patience=3, verbose=0, monitor_size=1024, print_every=1, use_numba=False, numba_threads=None, random_state=None)#

Mini-batch SSEKM.

Mini-batch optimisation of the semi-supervised equilibrium k-means objective. Supervision is provided via a prior matrix, using the prior_matrix keyword to fit() and prior_matrix_batch to partial_fit(). Labeled rows in the prior influence weights via the mixing factor theta.

Parameters:

n_clustersint, default=8
metric{‘euclidean’, ‘manhattan’}, default=’euclidean’
alphafloat or {‘dvariance’}, default=’dvariance’: Equilibrium weighting parameter ('dvariance' uses a subsample to estimate a heuristic value scaled by scale).
scalefloat, default=2.0: Scaling factor for the heuristic alpha.
thetafloat or {‘auto’}, default=’auto’: Supervision strength. 'auto' sets theta = |N| / |S|. Numeric values are used directly in both the objective and the labeled-row weight update.
batch_sizeint, default=256
max_epochsint, default=10
n_initint, default=1
init{‘k-means’, ‘k-means++’, ‘random’} or ndarray, default=’k-means++’
init_sizeint or None, default=None
shufflebool, default=True
learning_ratefloat or None, default=None
tolfloat, default=1e-4
reassignment_ratiofloat, default=0.0
reassign_patienceint, default=3
verboseint, default=0
monitor_sizeint or None, default=1024
print_everyint, default=1
use_numbabool, default=False
numba_threadsint or None, default=None
random_stateint or None, default=None

Attributes:

cluster_centers_ndarray of shape (n_clusters, n_features): Final centers after training.
labels_ndarray of shape (n_samples,): Hard assignment labels for the training data (available after fit()).
alpha_float: Resolved alpha value.
theta_super_float: Resolved supervision strength used ('auto' or numeric).
objective_approx_list of float: Epoch-wise approximate objectives measured on a monitoring subset.
counts_ndarray of shape (n_clusters,): Accumulated batch weights per cluster (accumulation mode; present after fit()).
sums_ndarray of shape (n_clusters, n_features): Accumulated weighted sums per cluster (accumulation mode; present after fit()).
W_, U_ndarrays: Final equilibrium weights and memberships for the full training data (set by fit()).
n_epochs_int: Number of epochs run in the best initialisation.
n_features_in_int: Number of features seen during the first call to fit() or partial_fit().

Notes

Provide the full-dataset prior using prior_matrix to fit(), or mini-batch priors using prior_matrix_batch to partial_fit().
Unlabeled rows are all zeros; labeled rows are row-normalized when positive.
The monitoring objective returned in objective_approx_ includes the supervised term scaled by theta when a prior is provided.

>>> ssekm.cluster_centers_
 array([[1.25126245, 0.55312346],
     [3.54580155, 3.51798824]])
 >>> ssekm.predict([[0, 0], [4, 4]])
 array([0, 1])

__init__(n_clusters=8, *, metric='euclidean', alpha='dvariance', scale=2.0, theta='auto', batch_size=256, max_epochs=10, n_init=1, init='k-means++', init_size=None, shuffle=True, learning_rate=None, tol=0.0001, reassignment_ratio=0.0, reassign_patience=3, verbose=0, monitor_size=1024, print_every=1, use_numba=False, numba_threads=None, random_state=None)#

Methods

`__init__`([n_clusters, metric, alpha, scale, ...])
`fit`(X[, y, prior_matrix, F])	Train the mini-batch semi-supervised estimator on the full dataset.
`fit_membership`(X[, y, prior_matrix, F])	Fit to `X` and return the final membership matrix for training data.
`fit_predict`(X[, y, prior_matrix, F])	Fit the model and return hard labels for X.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`membership`(X)	Soft membership (U) computed from distances using current `alpha_`.
`partial_fit`(X_batch[, y, ...])
`predict`(X)	Predict the closest cluster each sample in X belongs to.
`set_fit_request`(*[, F, prior_matrix])	Configure whether metadata should be requested to be passed to the `fit` method.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`set_partial_fit_request`(*[, F_batch, ...])	Configure whether metadata should be requested to be passed to the `partial_fit` method.
`transform`(X)	Transform X to a cluster-distance space (pairwise distances).

fit(X, y=None, *, prior_matrix=None, F=None)#

Train the mini-batch semi-supervised estimator on the full dataset.

Runs multiple epochs of mini-batch updates. Supervision can be provided via prior_matrix (preferred) or F; provide only one. Unlabeled rows are all zeros; labeled rows are row-normalized internally when positive.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Prior probability matrix for supervision.
F (array-like of shape (n_samples, n_clusters), optional) – Prior probability matrix for supervision.

Returns:

self – Fitted estimator.

Return type:

MiniBatchSSEKM

fit_membership(X, y=None, *, prior_matrix=None, F=None)#

Fit to X and return the final membership matrix for training data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer prior_matrix; F is kept for backward compatibility. Provide only one of them.
F (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer prior_matrix; F is kept for backward compatibility. Provide only one of them.

Returns:

U – Membership matrix U_ computed on the training data.

Return type:

ndarray of shape (n_samples, n_clusters)

fit_predict(X, y=None, *, prior_matrix=None, F=None)#

Fit the model and return hard labels for X.

Performs full mini-batch training (up to max_epochs) and returns the predicted cluster index for each sample in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer prior_matrix; F is kept for backward compatibility. Provide only one of them.
F (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer prior_matrix; F is kept for backward compatibility. Provide only one of them.

Returns:

labels – Hard cluster assignments for the input samples.

Return type:

ndarray of shape (n_samples,)

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

membership(X)#

Soft membership (U) computed from distances using current alpha_.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples for which to compute memberships.
Returns:: U – Row-stochastic membership matrix computed as normalized exp(-alpha * d^2_shift) per row.
Return type:: ndarray of shape (n_samples, n_clusters)

predict(X)#

Predict the closest cluster each sample in X belongs to.

Parameters:: X (array-like of shape (n_samples, n_features)) – New samples to assign.
Returns:: labels – Indices of the nearest centers under the configured metric.
Return type:: ndarray of shape (n_samples,)

set_fit_request(*, F: bool | None | str = '$UNCHANGED$', prior_matrix: bool | None | str = '$UNCHANGED$') → MiniBatchSSEKM#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

F (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for F parameter in fit.
prior_matrix (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for prior_matrix parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

Configure whether metadata should be requested to be passed to the partial_fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

F_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for F_batch parameter in partial_fit.
X_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_batch parameter in partial_fit.
prior_matrix_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for prior_matrix_batch parameter in partial_fit.

Returns:

self – The updated object.

Return type:

object

transform(X)#

Transform X to a cluster-distance space (pairwise distances).

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to transform.
Returns:: distances – Pairwise distances to cluster_centers_ using the estimator’s metric.
Return type:: ndarray of shape (n_samples, n_clusters)

sklekmeans.MiniBatchSSEKM#

This Page