turicreate.toolkits.distances.weighted_jaccard

turicreate.toolkits.distances.weighted_jaccard(x, y)

Compute the weighted Jaccard distance between between two dictionaries. Suppose Kx and Ky are the sets of keys from the two input dictionaries, while xk and yk are the values associated with key k in the respective dictionaries. Typically these values are counts, i.e. of words or n-grams.

D(x,y)=1kKxKymin{xk,yk}kKxKymax{xk,yk}
Parameters:
x : dict

First input dictionary.

y : dict

Second input dictionary.

Returns:
out : float

Weighted jaccard distance between x and y.

Notes

  • If a key is missing in one of the two dictionaries, it is assumed to have value 0.

References

  • Weighted Jaccard distance: Chierichetti, F., et al. (2010) Finding the Jaccard Median. Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics.

Examples

>>> tc.distances.weighted_jaccard({'a': 2, 'c': 4},
...                               {'b': 3, 'c': 12})
0.7647058823529411