turicreate.kmeans.KmeansModel.predict

KmeansModel.predict(self, dataset, output_type='cluster_id', verbose=True)

Return predicted cluster label for instances in the new ‘dataset’. K-means predictions are made by assigning each new instance to the closest cluster center.

Parameters:
dataset : SFrame

Dataset of new observations. Must include the features used for model training; additional columns are ignored.

output_type : {‘cluster_id’, ‘distance’}, optional

Form of the prediction. ‘cluster_id’ (the default) returns the cluster label assigned to each input instance, while ‘distance’ returns the Euclidean distance between the instance and its assigned cluster’s center.

verbose : bool, optional

If True, print progress updates to the screen.

Returns:
out : SArray

Model predictions. Depending on the specified output_type, either the assigned cluster label or the distance of each point to its closest cluster center. The order of the predictions is the same as order of the input data rows.

See also

create

Examples

>>> sf = turicreate.SFrame({
...     'x1': [0.6777, -9.391, 7.0385, 2.2657, 7.7864, -10.16, -8.162,
...            8.8817, -9.525, -9.153, 2.0860, 7.6619, 6.5511, 2.7020],
...     'x2': [5.6110, 8.5139, 5.3913, 5.4743, 8.3606, 7.8843, 2.7305,
...            5.1679, 6.7231, 3.7051, 1.7682, 7.4608, 3.1270, 6.5624]})
...
>>> model = turicreate.kmeans.create(sf, num_clusters=3)
...
>>> sf_new = turicreate.SFrame({'x1': [-5.6584, -1.0167, -9.6181],
...                           'x2': [-6.3803, -3.7937, -1.1022]})
>>> clusters = model.predict(sf_new, output_type='cluster_id')
>>> print clusters
[1, 0, 1]