turicreate.toolkits.distances.weighted_jaccard¶

turicreate.toolkits.distances.
weighted_jaccard
(x, y)¶ Compute the weighted Jaccard distance between between two dictionaries. Suppose \(K_x\) and \(K_y\) are the sets of keys from the two input dictionaries, while \(x_k\) and \(y_k\) are the values associated with key \(k\) in the respective dictionaries. Typically these values are counts, i.e. of words or ngrams.
\[D(x, y) = 1  \frac{\sum_{k \in K_x \cup K_y} \min\{x_k, y_k\}} {\sum_{k \in K_x \cup K_y} \max\{x_k, y_k\}}\]Parameters:  x : dict
First input dictionary.
 y : dict
Second input dictionary.
Returns:  out : float
Weighted jaccard distance between x and y.
Notes
 If a key is missing in one of the two dictionaries, it is assumed to have value 0.
References
 Weighted Jaccard distance: Chierichetti, F., et al. (2010) Finding the Jaccard Median. Proceedings of the TwentyFirst Annual ACMSIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics.
Examples
>>> tc.distances.weighted_jaccard({'a': 2, 'c': 4}, ... {'b': 3, 'c': 12}) 0.7647058823529411