turicreate.nearest_neighbors.NearestNeighborsModel.similarity_graph¶
-
NearestNeighborsModel.
similarity_graph
(k=5, radius=None, include_self_edges=False, output_type='SGraph', verbose=True)¶ Construct the similarity graph on the reference dataset, which is already stored in the model. This is conceptually very similar to running query with the reference set, but this method is optimized for the purpose, syntactically simpler, and automatically removes self-edges.
Parameters: - k : int, optional
Maximum number of neighbors to return for each point in the dataset. Setting this to
None
deactivates the constraint, so that all neighbors are returned withinradius
of a given point.- radius : float, optional
For a given point, only neighbors within this distance are returned. The default is
None
, in which case thek
nearest neighbors are returned for each query point, regardless of distance.- include_self_edges : bool, optional
For most distance functions, each point in the model’s reference dataset is its own nearest neighbor. If this parameter is set to False, this result is ignored, and the nearest neighbors are returned excluding the point itself.
- output_type : {‘SGraph’, ‘SFrame’}, optional
By default, the results are returned in the form of an SGraph, where each point in the reference dataset is a vertex and an edge A -> B indicates that vertex B is a nearest neighbor of vertex A. If ‘output_type’ is set to ‘SFrame’, the output is in the same form as the results of the ‘query’ method: an SFrame with columns indicating the query label (in this case the query data is the same as the reference data), reference label, distance between the two points, and the rank of the neighbor.
- verbose : bool, optional
If True, print progress updates and model details.
Returns: - out : SFrame or SGraph
The type of the output object depends on the ‘output_type’ parameter. See the parameter description for more detail.
Notes
- If both
k
andradius
are set toNone
, each data point is matched to the entire dataset. If the reference dataset has \(n\) rows, the output is an SFrame with \(n^2\) rows (or an SGraph with \(n^2\) edges). - For models created with the ‘lsh’ method, the output similarity graph may have fewer vertices than there are data points in the original reference set. Because LSH is an approximate method, a query point may have fewer than ‘k’ neighbors. If LSH returns no neighbors at all for a query and self-edges are excluded, the query point is omitted from the results.
Examples
First construct an SFrame and create a nearest neighbors model:
>>> sf = turicreate.SFrame({'x1': [0.98, 0.62, 0.11], ... 'x2': [0.69, 0.58, 0.36]}) ... >>> model = turicreate.nearest_neighbors.create(sf, distance='euclidean')
Unlike the
query
method, there is no need for a second dataset withsimilarity_graph
.>>> g = model.similarity_graph(k=1) # an SGraph >>> g.edges +----------+----------+----------------+------+ | __src_id | __dst_id | distance | rank | +----------+----------+----------------+------+ | 0 | 1 | 0.376430604494 | 1 | | 2 | 1 | 0.55542776308 | 1 | | 1 | 0 | 0.376430604494 | 1 | +----------+----------+----------------+------+