turicreate.kmeans.KmeansModel.predict¶

KmeansModel.predict(dataset, output_type='cluster_id', verbose=True)¶

Return predicted cluster label for instances in the new ‘dataset’. K-means predictions are made by assigning each new instance to the closest cluster center.

Parameters:

dataset : SFrame: Dataset of new observations. Must include the features used for model training; additional columns are ignored.
output_type : {‘cluster_id’, ‘distance’}, optional: Form of the prediction. ‘cluster_id’ (the default) returns the cluster label assigned to each input instance, while ‘distance’ returns the Euclidean distance between the instance and its assigned cluster’s center.
verbose : bool, optional: If True, print progress updates to the screen.

Returns:

out : SArray: Model predictions. Depending on the specified output_type, either the assigned cluster label or the distance of each point to its closest cluster center. The order of the predictions is the same as order of the input data rows.

See also

create

Examples

>>> sf = turicreate.SFrame({
...     'x1': [0.6777, -9.391, 7.0385, 2.2657, 7.7864, -10.16, -8.162,
...            8.8817, -9.525, -9.153, 2.0860, 7.6619, 6.5511, 2.7020],
...     'x2': [5.6110, 8.5139, 5.3913, 5.4743, 8.3606, 7.8843, 2.7305,
...            5.1679, 6.7231, 3.7051, 1.7682, 7.4608, 3.1270, 6.5624]})
...
>>> model = turicreate.kmeans.create(sf, num_clusters=3)
...
>>> sf_new = turicreate.SFrame({'x1': [-5.6584, -1.0167, -9.6181],
...                           'x2': [-6.3803, -3.7937, -1.1022]})
>>> clusters = model.predict(sf_new, output_type='cluster_id')
>>> print clusters
[1, 0, 1]