Turi Create
4.0
|
#include <ml/sketches/hyperloglog.hpp>
Public Member Functions | |
hyperloglog (size_t b=16) | |
Buckets. More... | |
template<typename T > | |
void | add (const T &t) |
void | combine (const hyperloglog &other) |
double | error_bound () |
double | estimate () |
An implementation of the hyperloglog sketch for estimating the number of unique elements in a datastream.
Implements the hyperloglog sketch algorithm as described in: Philippe Flajolet, Eric Fusy, Olivier Gandouet and Frederic Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Conference on Analysis of Algorithms (AofA) 2007.
with further reference from: Stefan Heule, Marc Nunkesser and Alexander Hall. HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm. Proceedings of the EDBT 2013 Conference.
Usage is simple.
Definition at line 43 of file hyperloglog.hpp.
|
inlineexplicit |
Buckets.
Constructs a hyperloglog sketch using 2^b buckets. The resultant hyperloglog datastructure will require 2^b bytes of memory. b must be at least 4. (i.e. 2^4 buckets).
Definition at line 56 of file hyperloglog.hpp.
|
inline |
Adds an arbitrary object to be counted. Any object type can be used, and there are no restrictions as long as std::hash<T> can be used to obtain a hash value.
Definition at line 81 of file hyperloglog.hpp.
|
inline |
Merge two hyperloglog datastructures. The two data structures must be constructed with the same number of buckets. Combining of two sketches constructed on two disjoint data streams produces identical results to generating one sketch on the both data streams.
Definition at line 99 of file hyperloglog.hpp.
|
inline |
Returns the standard error of the estimate.
Definition at line 110 of file hyperloglog.hpp.
|
inline |
Returns the estimate of the number of unique items.
Definition at line 117 of file hyperloglog.hpp.