turicreate.nearest_neighbor_classifier.NearestNeighborClassifier.predict_topk

NearestNeighborClassifier.predict_topk(dataset, max_neighbors=10, radius=None, k=3, verbose=False)

Return top-k most likely predictions for each observation in dataset. Predictions are returned as an SFrame with three columns: row_id, class, and probability.

Parameters:
dataset : SFrame

Dataset of new observations. Must include the features used for model training, but does not require a target column. Additional columns are ignored.

max_neighbors : int, optional

Maximum number of neighbors to consider for each point.

radius : float, optional

Maximum distance from each point to a neighbor in the reference dataset.

k : int, optional

Number of classes to return for each input example.

Returns:
out : SFrame

See also

create, classify, predict

Notes

  • If the ‘radius’ parameter is small, it is possible that a query point has no neighbors in the training dataset. In this case, the query is dropped from the SFrame output by this method. If all queries have no neighbors, then the result is an empty SFrame. If the target column in the training dataset has missing values, these predictions will be ambiguous.
  • Ties between predicted classes are broken randomly.

Examples

>>> sf_train = turicreate.SFrame({'species': ['cat', 'dog', 'fossa', 'dog'],
...                             'height': [9, 25, 20, 23],
...                             'weight': [13, 28, 33, 22]})
...
>>> sf_new = turicreate.SFrame({'height': [26, 19],
...                           'weight': [25, 35]})
...
>>> m = turicreate.nearest_neighbor_classifier.create(sf_train, target='species')
>>> ystar = m.predict_topk(sf_new, max_neighbors=2)
>>> print ystar
+--------+-------+-------------+
| row_id | class | probability |
+--------+-------+-------------+
|   0    |  dog  |     1.0     |
|   1    | fossa |     0.5     |
|   1    |  dog  |     0.5     |
+--------+-------+-------------+