turicreate.nearest_neighbor_classifier.create¶
-
turicreate.nearest_neighbor_classifier.
create
(dataset, target, features=None, distance=None, verbose=True)¶ Create a
NearestNeighborClassifier
model. This model predicts the class of a query instance by finding the most common class among the query’s nearest neighbors.Parameters: - dataset : SFrame
Dataset for training the model.
- target : str
Name of the column containing the target variable. The values in this column must be of string or integer type.
- features : list[str], optional
Name of the columns with features to use in comparing records. ‘None’ (the default) indicates that all columns except the target variable should be used. Please note: if distance is specified as a composite distance, then that parameter controls which features are used in the model. Each column can be one of the following types:
- Numeric: values of numeric type integer or float.
- Array: array of numeric (integer or float) values. Each array element is treated as a separate variable in the model.
- Dictionary: key-value pairs with numeric (integer or float) values. Each key indicates a separate variable in the model.
- String: string values.
Please note: if distance is specified as a composite distance, then that parameter controls which features are used in the model.
- distance : str, function, or list[list], optional
Function to measure the distance between any two input data rows. This may be one of three types:
- String: the name of a standard distance function. One of ‘euclidean’, ‘squared_euclidean’, ‘manhattan’, ‘levenshtein’, ‘jaccard’, ‘weighted_jaccard’, ‘cosine’ or ‘transformed_dot_product’.
- Function: a function handle from the
distances
module. - Composite distance: the weighted sum of several standard distance
functions applied to various features. This is specified as a list of
distance components, each of which is itself a list containing three
items:
- list or tuple of feature names (str)
- standard distance name (str)
- scaling factor (int or float)
For more information about Turi Create distance functions, please see the
distances
module.For sparse vectors, missing keys are assumed to have value 0.0.
If ‘distance’ is left unspecified or set to ‘auto’, a composite distance is constructed automatically based on feature types.
- verbose : bool, optional
If True, print progress updates and model details.
Returns: - out : NearestNeighborClassifier
A trained model of type
NearestNeighborClassifier
.
See also
References
- Wikipedia - nearest neighbors classifier
- Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning. Vol. 2. New York. Springer. pp. 463-481.
Examples
>>> sf = turicreate.SFrame({'species': ['cat', 'dog', 'fossa', 'dog'], ... 'height': [9, 25, 20, 23], ... 'weight': [13, 28, 33, 22]}) ... >>> model = turicreate.nearest_neighbor_classifier.create(sf, target='species')
As with the nearest neighbors toolkit, the nearest neighbor classifier accepts composite distance functions.
>>> my_dist = [[('height', 'weight'), 'euclidean', 2.7], ... [('height', 'weight'), 'manhattan', 1.6]] ... >>> model = turicreate.nearest_neighbor_classifier.create(sf, target='species', ... distance=my_dist)