Turi Create  4.0
turi::sketches::space_saving< T > Class Template Reference

#include <ml/sketches/space_saving.hpp>

Public Member Functions

 space_saving (double epsilon=0.0001)
 
void initialize (double epsilon=0.0001)
 
void clear ()
 
void add (const T &t, size_t count=1)
 
size_t size () const
 
std::vector< std::pair< T, size_t > > frequent_items () const
 
std::vector< std::pair< T, size_t > > guaranteed_frequent_items () const
 
template<typename U >
std::enable_if< std::is_convertible< U, T >::value, void >::type combine (const space_saving< U > &other)
 

Detailed Description

template<typename T>
class turi::sketches::space_saving< T >

This class implements the Space Saving Sketch as described in Ahmed Metwally † Divyakant Agrawal Amr El Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams.

It provides an efficient one pass scan of all the data and provides an estimate all the frequently occuring elements, with guarantees that all elements with occurances >= N will be reported.

// repeatedly call
ss.add(stuff);
// will return an array containing all the elements tracked
// not all elements may be truly frequent items
ss.frequent_items()
// will return an array containing all the elements tracked which are
// guaranteed to have occurances >= \epsilon N

Definition at line 42 of file space_saving.hpp.

Constructor & Destructor Documentation

◆ space_saving()

template<typename T>
turi::sketches::space_saving< T >::space_saving ( double  epsilon = 0.0001)
inline

Constructs a save saving sketch using 1 / epsilon buckets. The resultant hyperloglog datastructure will 1 / epsilon memory, and guarantees that all elements with occurances >= N will be reported.

Definition at line 49 of file space_saving.hpp.

Member Function Documentation

◆ add()

template<typename T>
void turi::sketches::space_saving< T >::add ( const T &  t,
size_t  count = 1 
)
inline

Adds an item with a specified count to the sketch.

Definition at line 85 of file space_saving.hpp.

◆ clear()

template<typename T>
void turi::sketches::space_saving< T >::clear ( )
inline

Clears everything out.

Definition at line 72 of file space_saving.hpp.

◆ combine()

template<typename T>
template<typename U >
std::enable_if<std::is_convertible<U, T>::value, void>::type turi::sketches::space_saving< T >::combine ( const space_saving< U > &  other)
inline

Merges a second space saving sketch into the current sketch

Definition at line 160 of file space_saving.hpp.

◆ frequent_items()

template<typename T>
std::vector<std::pair<T, size_t> > turi::sketches::space_saving< T >::frequent_items ( ) const
inline

Returns all the elements tracked by the sketch as well as an estimated count. The estimated can be a large overestimate.

Definition at line 100 of file space_saving.hpp.

◆ guaranteed_frequent_items()

template<typename T>
std::vector<std::pair<T, size_t> > turi::sketches::space_saving< T >::guaranteed_frequent_items ( ) const
inline

Returns all the elements tracked by the sketch as well as an estimated count. All elements returned are guaranteed to have occurance >= epsilon * m_size

Definition at line 131 of file space_saving.hpp.

◆ initialize()

template<typename T>
void turi::sketches::space_saving< T >::initialize ( double  epsilon = 0.0001)
inline

Initalizes a save saving sketch using 1 / epsilon buckets. The resultant hyperloglog datastructure will use O(1 / epsilon) memory, and guarantees that all elements with occurances >= N will be reported.

Definition at line 61 of file space_saving.hpp.

◆ size()

template<typename T>
size_t turi::sketches::space_saving< T >::size ( ) const
inline

Returns the number of elements inserted into the sketch.

Definition at line 92 of file space_saving.hpp.


The documentation for this class was generated from the following file: