turicreate.nearest_neighbors.NearestNeighborsModel.similarity_graph¶

NearestNeighborsModel.similarity_graph(k=5, radius=None, include_self_edges=False, output_type='SGraph', verbose=True)¶

Construct the similarity graph on the reference dataset, which is already stored in the model. This is conceptually very similar to running query with the reference set, but this method is optimized for the purpose, syntactically simpler, and automatically removes self-edges.

Parameters:

k : int, optional: Maximum number of neighbors to return for each point in the dataset. Setting this to None deactivates the constraint, so that all neighbors are returned within radius of a given point.
radius : float, optional: For a given point, only neighbors within this distance are returned. The default is None, in which case the k nearest neighbors are returned for each query point, regardless of distance.
include_self_edges : bool, optional: For most distance functions, each point in the model’s reference dataset is its own nearest neighbor. If this parameter is set to False, this result is ignored, and the nearest neighbors are returned excluding the point itself.
output_type : {‘SGraph’, ‘SFrame’}, optional: By default, the results are returned in the form of an SGraph, where each point in the reference dataset is a vertex and an edge A -> B indicates that vertex B is a nearest neighbor of vertex A. If ‘output_type’ is set to ‘SFrame’, the output is in the same form as the results of the ‘query’ method: an SFrame with columns indicating the query label (in this case the query data is the same as the reference data), reference label, distance between the two points, and the rank of the neighbor.
verbose : bool, optional: If True, print progress updates and model details.

Returns:

out : SFrame or SGraph: The type of the output object depends on the ‘output_type’ parameter. See the parameter description for more detail.

Notes

If both k and radius are set to None, each data point is matched to the entire dataset. If the reference dataset has $n$ rows, the output is an SFrame with $n^2$ rows (or an SGraph with $n^2$ edges).
For models created with the ‘lsh’ method, the output similarity graph may have fewer vertices than there are data points in the original reference set. Because LSH is an approximate method, a query point may have fewer than ‘k’ neighbors. If LSH returns no neighbors at all for a query and self-edges are excluded, the query point is omitted from the results.

Examples

First construct an SFrame and create a nearest neighbors model:

>>> sf = turicreate.SFrame({'x1': [0.98, 0.62, 0.11],
...                       'x2': [0.69, 0.58, 0.36]})
...
>>> model = turicreate.nearest_neighbors.create(sf, distance='euclidean')

Unlike the query method, there is no need for a second dataset with similarity_graph.

>>> g = model.similarity_graph(k=1)  # an SGraph
>>> g.edges
+----------+----------+----------------+------+
| __src_id | __dst_id |    distance    | rank |
+----------+----------+----------------+------+
|    0     |    1     | 0.376430604494 |  1   |
|    2     |    1     | 0.55542776308  |  1   |
|    1     |    0     | 0.376430604494 |  1   |
+----------+----------+----------------+------+