turicreate.nearest_neighbor_classifier.NearestNeighborClassifier.predict_topk¶

NearestNeighborClassifier.predict_topk(dataset, max_neighbors=10, radius=None, k=3, verbose=False)¶

Return top-k most likely predictions for each observation in dataset. Predictions are returned as an SFrame with three columns: row_id, class, and probability.

Parameters:

dataset : SFrame: Dataset of new observations. Must include the features used for model training, but does not require a target column. Additional columns are ignored.
max_neighbors : int, optional: Maximum number of neighbors to consider for each point.
radius : float, optional: Maximum distance from each point to a neighbor in the reference dataset.
k : int, optional: Number of classes to return for each input example.

Returns:

out : SFrame

See also

create, classify, predict

Notes

If the ‘radius’ parameter is small, it is possible that a query point has no neighbors in the training dataset. In this case, the query is dropped from the SFrame output by this method. If all queries have no neighbors, then the result is an empty SFrame. If the target column in the training dataset has missing values, these predictions will be ambiguous.
Ties between predicted classes are broken randomly.

Examples

>>> sf_train = turicreate.SFrame({'species': ['cat', 'dog', 'fossa', 'dog'],
...                             'height': [9, 25, 20, 23],
...                             'weight': [13, 28, 33, 22]})
...
>>> sf_new = turicreate.SFrame({'height': [26, 19],
...                           'weight': [25, 35]})
...
>>> m = turicreate.nearest_neighbor_classifier.create(sf_train, target='species')
>>> ystar = m.predict_topk(sf_new, max_neighbors=2)
>>> print ystar
+--------+-------+-------------+
| row_id | class | probability |
+--------+-------+-------------+
|   0    |  dog  |     1.0     |
|   1    | fossa |     0.5     |
|   1    |  dog  |     0.5     |
+--------+-------+-------------+