Turi Create  4.0
turi::sketches::hyperloglog Class Reference

#include <ml/sketches/hyperloglog.hpp>

Public Member Functions

 hyperloglog (size_t b=16)
 Buckets. More...
 
template<typename T >
void add (const T &t)
 
void combine (const hyperloglog &other)
 
double error_bound ()
 
double estimate ()
 

Detailed Description

An implementation of the hyperloglog sketch for estimating the number of unique elements in a datastream.

Implements the hyperloglog sketch algorithm as described in: Philippe Flajolet, Eric Fusy, Olivier Gandouet and Frederic Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Conference on Analysis of Algorithms (AofA) 2007.

with further reference from: Stefan Heule, Marc Nunkesser and Alexander Hall. HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm. Proceedings of the EDBT 2013 Conference.

Usage is simple.

// repeatedly call
hll.add(stuff) // this is a templatized function.
// will accept anything it can hash using std::hash.
hll.estimate() // will return an estimate of the number of unique element
hll.error_bound() // will return the standard deviation on the estimate

Definition at line 43 of file hyperloglog.hpp.

Constructor & Destructor Documentation

◆ hyperloglog()

turi::sketches::hyperloglog::hyperloglog ( size_t  b = 16)
inlineexplicit

Buckets.

Constructs a hyperloglog sketch using 2^b buckets. The resultant hyperloglog datastructure will require 2^b bytes of memory. b must be at least 4. (i.e. 2^4 buckets).

Definition at line 56 of file hyperloglog.hpp.

Member Function Documentation

◆ add()

template<typename T >
void turi::sketches::hyperloglog::add ( const T &  t)
inline

Adds an arbitrary object to be counted. Any object type can be used, and there are no restrictions as long as std::hash<T> can be used to obtain a hash value.

Definition at line 81 of file hyperloglog.hpp.

◆ combine()

void turi::sketches::hyperloglog::combine ( const hyperloglog other)
inline

Merge two hyperloglog datastructures. The two data structures must be constructed with the same number of buckets. Combining of two sketches constructed on two disjoint data streams produces identical results to generating one sketch on the both data streams.

Definition at line 99 of file hyperloglog.hpp.

◆ error_bound()

double turi::sketches::hyperloglog::error_bound ( )
inline

Returns the standard error of the estimate.

Definition at line 110 of file hyperloglog.hpp.

◆ estimate()

double turi::sketches::hyperloglog::estimate ( )
inline

Returns the estimate of the number of unique items.

Definition at line 117 of file hyperloglog.hpp.


The documentation for this class was generated from the following file: