sklekmeans.MiniBatchSSEKM#
- class sklekmeans.MiniBatchSSEKM(n_clusters=8, *, metric='euclidean', alpha='dvariance', scale=2.0, theta='auto', batch_size=256, max_epochs=10, n_init=1, init='k-means++', init_size=None, shuffle=True, learning_rate=None, tol=0.0001, reassignment_ratio=0.0, reassign_patience=3, verbose=0, monitor_size=1024, print_every=1, use_numba=False, numba_threads=None, random_state=None)#
Mini-batch SSEKM.
Mini-batch optimisation of the semi-supervised equilibrium k-means objective. Supervision is provided via a prior matrix, using the
prior_matrixkeyword tofit()andprior_matrix_batchtopartial_fit(). Labeled rows in the prior influence weights via the mixing factortheta.- Parameters:
- n_clustersint, default=8
- metric{‘euclidean’, ‘manhattan’}, default=’euclidean’
- alphafloat or {‘dvariance’}, default=’dvariance’
Equilibrium weighting parameter (
'dvariance'uses a subsample to estimate a heuristic value scaled byscale).- scalefloat, default=2.0
Scaling factor for the heuristic alpha.
- thetafloat or {‘auto’}, default=’auto’
Supervision strength.
'auto'setstheta = |N| / |S|. Numeric values are used directly in both the objective and the labeled-row weight update.- batch_sizeint, default=256
- max_epochsint, default=10
- n_initint, default=1
- init{‘k-means’, ‘k-means++’, ‘random’} or ndarray, default=’k-means++’
- init_sizeint or None, default=None
- shufflebool, default=True
- learning_ratefloat or None, default=None
- tolfloat, default=1e-4
- reassignment_ratiofloat, default=0.0
- reassign_patienceint, default=3
- verboseint, default=0
- monitor_sizeint or None, default=1024
- print_everyint, default=1
- use_numbabool, default=False
- numba_threadsint or None, default=None
- random_stateint or None, default=None
- Attributes:
- cluster_centers_ndarray of shape (n_clusters, n_features)
Final centers after training.
- labels_ndarray of shape (n_samples,)
Hard assignment labels for the training data (available after
fit()).- alpha_float
Resolved alpha value.
- theta_super_float
Resolved supervision strength used (
'auto'or numeric).- objective_approx_list of float
Epoch-wise approximate objectives measured on a monitoring subset.
- counts_ndarray of shape (n_clusters,)
Accumulated batch weights per cluster (accumulation mode; present after
fit()).- sums_ndarray of shape (n_clusters, n_features)
Accumulated weighted sums per cluster (accumulation mode; present after
fit()).- W_, U_ndarrays
Final equilibrium weights and memberships for the full training data (set by
fit()).- n_epochs_int
Number of epochs run in the best initialisation.
- n_features_in_int
Number of features seen during the first call to
fit()orpartial_fit().
Notes
Provide the full-dataset prior using
prior_matrixtofit(), or mini-batch priors usingprior_matrix_batchtopartial_fit().Unlabeled rows are all zeros; labeled rows are row-normalized when positive.
The monitoring objective returned in
objective_approx_includes the supervised term scaled bythetawhen a prior is provided.
>>> ssekm.cluster_centers_ array([[1.25126245, 0.55312346], [3.54580155, 3.51798824]]) >>> ssekm.predict([[0, 0], [4, 4]]) array([0, 1])
- __init__(n_clusters=8, *, metric='euclidean', alpha='dvariance', scale=2.0, theta='auto', batch_size=256, max_epochs=10, n_init=1, init='k-means++', init_size=None, shuffle=True, learning_rate=None, tol=0.0001, reassignment_ratio=0.0, reassign_patience=3, verbose=0, monitor_size=1024, print_every=1, use_numba=False, numba_threads=None, random_state=None)#
Methods
__init__([n_clusters, metric, alpha, scale, ...])fit(X[, y, prior_matrix, F])Train the mini-batch semi-supervised estimator on the full dataset.
fit_membership(X[, y, prior_matrix, F])Fit to
Xand return the final membership matrix for training data.fit_predict(X[, y, prior_matrix, F])Fit the model and return hard labels for X.
fit_transform(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
membership(X)Soft membership (U) computed from distances using current
alpha_.partial_fit(X_batch[, y, ...])predict(X)Predict the closest cluster each sample in X belongs to.
set_fit_request(*[, F, prior_matrix])Configure whether metadata should be requested to be passed to the
fitmethod.set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_partial_fit_request(*[, F_batch, ...])Configure whether metadata should be requested to be passed to the
partial_fitmethod.transform(X)Transform X to a cluster-distance space (pairwise distances).
- fit(X, y=None, *, prior_matrix=None, F=None)#
Train the mini-batch semi-supervised estimator on the full dataset.
Runs multiple epochs of mini-batch updates. Supervision can be provided via
prior_matrix(preferred) orF; provide only one. Unlabeled rows are all zeros; labeled rows are row-normalized internally when positive.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Prior probability matrix for supervision.
F (array-like of shape (n_samples, n_clusters), optional) – Prior probability matrix for supervision.
- Returns:
self – Fitted estimator.
- Return type:
- fit_membership(X, y=None, *, prior_matrix=None, F=None)#
Fit to
Xand return the final membership matrix for training data.- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer
prior_matrix;Fis kept for backward compatibility. Provide only one of them.F (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer
prior_matrix;Fis kept for backward compatibility. Provide only one of them.
- Returns:
U – Membership matrix
U_computed on the training data.- Return type:
ndarray of shape (n_samples, n_clusters)
- fit_predict(X, y=None, *, prior_matrix=None, F=None)#
Fit the model and return hard labels for X.
Performs full mini-batch training (up to
max_epochs) and returns the predicted cluster index for each sample inX.- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (Ignored) – Present for API consistency.
prior_matrix (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer
prior_matrix;Fis kept for backward compatibility. Provide only one of them.F (array-like of shape (n_samples, n_clusters), optional) – Supervision prior. Prefer
prior_matrix;Fis kept for backward compatibility. Provide only one of them.
- Returns:
labels – Hard cluster assignments for the input samples.
- Return type:
ndarray of shape (n_samples,)
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- membership(X)#
Soft membership (U) computed from distances using current
alpha_.- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples for which to compute memberships.
- Returns:
U – Row-stochastic membership matrix computed as normalized
exp(-alpha * d^2_shift)per row.- Return type:
ndarray of shape (n_samples, n_clusters)
- predict(X)#
Predict the closest cluster each sample in X belongs to.
- Parameters:
X (array-like of shape (n_samples, n_features)) – New samples to assign.
- Returns:
labels – Indices of the nearest centers under the configured metric.
- Return type:
ndarray of shape (n_samples,)
- set_fit_request(*, F: bool | None | str = '$UNCHANGED$', prior_matrix: bool | None | str = '$UNCHANGED$') MiniBatchSSEKM#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type:
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of
transformandfit_transform."default": Default output format of a transformer"pandas": DataFrame output"polars": Polars outputNone: Transform configuration is unchanged
Added in version 1.4:
"polars"option was added.- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_partial_fit_request(*, F_batch: bool | None | str = '$UNCHANGED$', X_batch: bool | None | str = '$UNCHANGED$', prior_matrix_batch: bool | None | str = '$UNCHANGED$') MiniBatchSSEKM#
Configure whether metadata should be requested to be passed to the
partial_fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
F_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
F_batchparameter inpartial_fit.X_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
X_batchparameter inpartial_fit.prior_matrix_batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
prior_matrix_batchparameter inpartial_fit.
- Returns:
self – The updated object.
- Return type:
- transform(X)#
Transform X to a cluster-distance space (pairwise distances).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to transform.
- Returns:
distances – Pairwise distances to
cluster_centers_using the estimator’s metric.- Return type:
ndarray of shape (n_samples, n_clusters)