turicreate.toolkits.distances.weighted_jaccard¶

turicreate.toolkits.distances.weighted_jaccard(x, y)

Compute the weighted Jaccard distance between between two dictionaries. Suppose $$K_x$$ and $$K_y$$ are the sets of keys from the two input dictionaries, while $$x_k$$ and $$y_k$$ are the values associated with key $$k$$ in the respective dictionaries. Typically these values are counts, i.e. of words or n-grams.

$D(x, y) = 1 - \frac{\sum_{k \in K_x \cup K_y} \min\{x_k, y_k\}} {\sum_{k \in K_x \cup K_y} \max\{x_k, y_k\}}$
Parameters: x : dict First input dictionary. y : dict Second input dictionary. out : float Weighted jaccard distance between x and y.

Notes

• If a key is missing in one of the two dictionaries, it is assumed to have value 0.

References

• Weighted Jaccard distance: Chierichetti, F., et al. (2010) Finding the Jaccard Median. Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics.

Examples

>>> tc.distances.weighted_jaccard({'a': 2, 'c': 4},
...                               {'b': 3, 'c': 12})
0.7647058823529411