|
| ~lsh_neighbors () |
|
void | init_options (const std::map< std::string, flexible_type > &_opts) override |
|
void | train (const sframe &X, const std::vector< flexible_type > &ref_labels, const std::vector< dist_component_type > &composite_distance_params, const std::map< std::string, flexible_type > &opts) override |
|
sframe | query (const v2::ml_data &mld_queries, const std::vector< flexible_type > &query_labels, const size_t k, const double radius, const bool include_self_edges) const override |
|
size_t | get_version () const override |
|
void | save_impl (turi::oarchive &oarc) const override |
|
void | load_version (turi::iarchive &iarc, size_t version) override |
|
LSH nearest neighbor class.
The intuition behind LSH-based indexes is to hash data points into buckets, such that similar points are more likely to be hashed to the same bucket than dissimilar ones. We could then find the approximate nearest neighbors of any point, simply by finding the bucket that it is hashed to.
It works as follows:
- Choose k hash functions h_1, h_2, ..., h_k from a uniform of some family of LSH functions. For any data point v, place v in the bucket with key g(v) = (h_1(v), h_2(v), ..., h_k(v)).
- Independently perform step 1 l times to construct l separate hash tables, with hash functions g_1, g_2, ..., g_l
You can set k and l by setting num_projections_per_table and num_tables respectively.
Definition at line 38 of file lsh_neighbors.hpp.
sframe turi::nearest_neighbors::lsh_neighbors::query |
( |
const v2::ml_data & |
mld_queries, |
|
|
const std::vector< flexible_type > & |
query_labels, |
|
|
const size_t |
k, |
|
|
const double |
radius, |
|
|
const bool |
include_self_edges |
|
) |
| const |
|
override |
Find neighbors of queries in a created LSH model.
For each query, the method keeps track of the current k-nearest neighbors in the LSH. At each node, the closest possible point in each child node to the query is computed, and if this distance is further than the current k'th nearest neighbor, that child node (and its descendants) is skpped in the traversal.
- Parameters
-
[in] | mld_queries | query data |
[in] | query_labels | sframe query labels |
[in] | k | size_t max number of neighbors to return for each query |
[in] | radius | double max distance for returned neighbors to each query |
[out] | ret | sframe SFrame with four columns: query label, reference label, distance, and rank. |
- Note
- Assumes that data is already in the right shape.