torch_kmeans.clustering package

class torch_kmeans.clustering.ConstrainedKMeans(init_method: str = 'rnd', num_init: int = 8, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.LpDistance'>, p_norm: int = 2, tol: float = 0.0001, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, n_priority_trials_before_fall_back: int = 5, raise_infeasible: bool = True, **kwargs)[source]

Bases: KMeans

Implements constrained k-means clustering. Priority implementation is based on the method of

Paper:

Geetha, S., G. Poonthalir, and P. T. Vanathi. “Improved k-means algorithm for capacitated clustering problem.” INFOCOMP Journal of Computer Science 8.4 (2009)

Parameters
  • init_method (str) – Method to initialize cluster centers: [‘rnd’, ‘topk’, ‘k-means++’, ‘ckm++’] (default: ‘rnd’)

  • num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers (default: 8).

  • max_iter (int) – Maximum number of iterations (default: 100).

  • distance (BaseDistance) – batched distance evaluator (default: LpDistance).

  • p_norm (int) – norm for lp distance (default: 2).

  • tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)

  • n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).

  • verbose (bool) – Verbosity flag to print additional info (default: True).

  • seed (Optional[int]) – Seed to fix random state for randomized center inits (default: 123).

  • n_priority_trials_before_fall_back (int) – Number of trials trying to assign samples to constrained clusters based on priority values before falling back to assigning the node with the highest weight to a cluster which can still accommodate it or the dummy cluster otherwise. (default: 5)

  • raise_infeasible (bool) – if set to False, will only display a warning instead of raising an error (default: True)

  • **kwargs – additional key word arguments for the distance function.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

INIT_METHODS = ['rnd', 'k-means++', 'topk', 'ckm++']
NORM_METHODS = []
predict(x: Tensor, weights: Tensor, **kwargs) LongTensor[source]

Predict the closest cluster each sample in X belongs to.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • weights (Tensor) – normalized weight for each sample (BS, N)

  • **kwargs – additional kwargs for assignment procedure

Returns

batch tensor of cluster labels for each sample (BS, N)

Return type

LongTensor

training: bool
class torch_kmeans.clustering.KMeans(init_method: str = 'rnd', num_init: int = 8, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.LpDistance'>, p_norm: int = 2, tol: float = 0.0001, normalize: ~typing.Optional[~typing.Union[str, bool]] = None, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, **kwargs)[source]

Bases: Module

Implements k-means clustering in terms of pytorch tensor operations which can be run on GPU. Supports batches of instances for use in batched training (e.g. for neural networks).

Partly based on ideas from:
Parameters
  • init_method (str) – Method to initialize cluster centers [‘rnd’, ‘k-means++’] (default: ‘rnd’)

  • num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers (default: 8).

  • max_iter (int) – Maximum number of iterations (default: 100).

  • distance (BaseDistance) – batched distance evaluator (default: LpDistance).

  • p_norm (int) – norm for lp distance (default: 2).

  • tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)

  • normalize (Optional[Union[str, bool]]) – String id of method to use to normalize input. one of [‘mean’, ‘minmax’, ‘unit’]. None to disable normalization. (default: None).

  • n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).

  • verbose (bool) – Verbosity flag to print additional info (default: True).

  • seed (Optional[int]) – Seed to fix random state for randomized center inits (default: True).

  • **kwargs – additional key word arguments for the distance function.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

INIT_METHODS = ['rnd', 'k-means++']
NORM_METHODS = ['mean', 'minmax', 'unit']
property is_fitted: bool

True if model was already fitted.

property num_clusters: Union[int, Tensor, Any]

Number of clusters in fitted model. Returns a tensor with possibly different numbers of clusters per instance for whole batch.

forward(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) ClusterResult[source]

torch.nn like forward pass.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )

  • centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)

  • **kwargs – additional kwargs for initialization or cluster procedure

Returns

ClusterResult tuple

Return type

ClusterResult

fit(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) Module[source]

Compute cluster centers and predict cluster index for each sample.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )

  • centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)

  • **kwargs – additional kwargs for initialization or cluster procedure

Returns

KMeans model

Return type

Module

predict(x: Tensor, **kwargs) LongTensor[source]

Predict the closest cluster each sample in X belongs to.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • **kwargs – additional kwargs for assignment procedure

Returns

batch tensor of cluster labels for each sample (BS, N)

Return type

LongTensor

fit_predict(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) LongTensor[source]

Compute cluster centers and predict cluster index for each sample.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )

  • centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)

  • **kwargs – additional kwargs for initialization or cluster procedure

Returns

batch tensor of cluster labels for each sample (BS, N)

Return type

LongTensor

training: bool
class torch_kmeans.clustering.SoftKMeans(init_method: str = 'rnd', num_init: int = 1, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.CosineSimilarity'>, p_norm: int = 1, normalize: str = 'unit', tol: float = 1e-05, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, temp: float = 5.0, **kwargs)[source]

Bases: KMeans

Implements differentiable soft k-means clustering. Method adapted from https://github.com/bwilder0/clusternet to support batches.

Paper:

Wilder et al., “End to End Learning and Optimization on Graphs” (NeurIPS’2019)

Parameters
  • init_method (str) – Method to initialize cluster centers: [‘rnd’, ‘topk’] (default: ‘rnd’)

  • num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers. If >1 selects the best configuration before propagating through fixpoint (default: 1).

  • max_iter (int) – Maximum number of iterations (default: 100).

  • distance (BaseDistance) – batched distance evaluator (default: CosineSimilarity).

  • p_norm (int) – norm for lp distance (default: 1).

  • normalize (str) – id of method to use to normalize input. (default: ‘unit’).

  • tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)

  • n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).

  • verbose (bool) – Verbosity flag to print additional info (default: True).

  • seed (Optional[int]) – Seed to fix random state for randomized center inits (default: True).

  • temp (float) – temperature for soft cluster assignments (default: 5.0).

  • **kwargs – additional key word arguments for the distance function.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

training: bool
class torch_kmeans.clustering.KNN(k: int, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.LpDistance'>, p_norm: int = 2, normalize: ~typing.Optional[~typing.Union[str, bool]] = None, **kwargs)[source]

Bases: Module

Implements k nearest neighbors in terms of pytorch tensor operations which can be run on GPU. Supports mini-batches of instances.

Parameters
  • k (int) – number of neighbors to consider

  • distance (BaseDistance) – batched distance evaluator (default: LpDistance).

  • p_norm (int) – norm for lp distance (default: 2).

  • normalize (Optional[Union[str, bool]]) – String id of method to use to normalize input. one of [‘mean’, ‘minmax’, ‘unit’]. None to disable normalization. (default: None).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

NORM_METHODS = ['mean', 'minmax', 'unit']
forward(x: Tensor, k: Optional[int] = None, same_source: bool = True) KNeighbors[source]

torch.nn like forward pass.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • k (Optional[int]) – optional number of neighbors to use

  • same_source (bool) – flag if each sample itself should be included as its own neighbor (default: True)

Returns

KNeighbors tuple

Return type

KNeighbors

fit(x: Tensor, k: Optional[int] = None, **kwargs) KNeighbors[source]

Compute k nearest neighbors for each sample.

Parameters
  • x (Tensor) – input features/coordinates (BS, N, D)

  • k (Optional[int]) – optional number of neighbors to use

  • **kwargs – additional kwargs for fitting procedure

Returns

KNeighbors tuple

Return type

KNeighbors

training: bool

Submodules