Turi Create
4.0
|
#include <ml/sketches/streaming_quantile_sketch.hpp>
Public Member Functions | |
streaming_quantile_sketch (double epsilon=0.005, const Comparator &comparator=Comparator()) | |
void | init (double epsilon, const Comparator &comparator) |
size_t | sketch_size () |
size_t | memory_usage () |
void | add (T t) |
void | finalize () |
size_t | size () const |
T | query (size_t rank) |
T | query_quantile (double quantile) |
T | fast_query (size_t rank) |
T | fast_query_quantile (double quantile) |
void | substream_finalize () |
void | combine (streaming_quantile_sketch other) |
void | combine_finalize () |
This class implements the streaming quantile sketching datastructure as described in Qi Zhang and Wei Wang. A Fast Algorithm for Approximate Quantiles in High Speed Data Streams. SSDBM 2007.
This sketch is basically a streaming version of the quantile_sketch
It constructs a low memory sketch from a stream of unknown length. To summarize the basic idea, it draws samples from the stream and maintains for each sample a lower bound and an upper bound on its rank in the stream.
Usage:
To query, two family of functions are provided.
The regular query functions take linear time in the size of the sketch, and have accuracy guarantees
The fast query functions take logarithmic time in the size of the sketch, and have no accuracy guarantees.
The sketch structure also support parallel sketching:
Example: If I have 16 data streams, I can construct 16 streaming_quantile_sketches:
Each sketch can then be ran on each data stream independently. Then all can be combined into one. Note that the process for doing this on the streaming_quantile_sketch is somewhat more complicated. It involves a partial finalization of each substream (sub_stream_finalize() ), before combining. And after combining, a different finalization call must be made (combine_finalize())
Type | to contain in the sketch. type must be less-than-comparable, constructible, copy constructible, assignable. |
We create a sequence of fixed length sketches, with each subsequent sketch sized to sketch twice the amount of data as the previous one. Each sketch is built to have a final finalized error of / 3.
If no sub-streams are used, the finalize() phase will merge all sketches and recompress to introduce an additional error of (2 * / 3)
If sub-streams are used, the sub-stream finalize will merge all the sub-stream sketches to introduce an additional error of ( / 3). Then each of the substream sketches are merged into the final sketch and compressed again to introduce an additional error of ( / 3)
Definition at line 22 of file quantile_sketch.hpp.
|
inlineexplicit |
Constructs a quantile_sketch with a desired target number of elements, and a desired accuracy. See init() for details.
Definition at line 110 of file streaming_quantile_sketch.hpp.
|
inline |
Inserts a value into the sketch. Not safe to use in parallel.
Definition at line 163 of file streaming_quantile_sketch.hpp.
|
inline |
Merges two substream_finalized quantile_sketch into one quantile sketch. Both this and the other quantile sketch must be substream_finalized(). This allows two quantile sketches to be generated on two disjoint streams of data, and then combined at the end.
Example: If I have 16 data streams, I can construct 16 quantile_sketches,
Each sketch can then be ran on each data stream independently. Then all can be combined into one:
Definition at line 377 of file streaming_quantile_sketch.hpp.
|
inline |
Stops combining, and finishes the final sketch making it available for querying.
Definition at line 390 of file streaming_quantile_sketch.hpp.
|
inline |
Quickly queries for the value at rank k of all the elements inserted. Assuming N elements are inserted. Rank 0 will be the minimum value, rank N-1, the maximum value.
Note that this rank is in relation to the total number of elements inserted. For instance,
sketch.fast_query(9999) will be the maximum.
Unlike query(), this function is never guaranteed to provide an epsilon error bound. Instead, this function assumes that each element in the sketch has a point estimate of its rank (the center of its rmin/rmax range), and looks for the element closest to the desired rank.
Definition at line 286 of file streaming_quantile_sketch.hpp.
|
inline |
Quickly returns the value at a quantile. For instance:
quantile values below 0 will be interpreted as 0 (hence the minimum). quantile values above 1 will be interpreted as 1 (hence the maximum).
Unlike query_quantile(), this function is never guaranteed to provide an epsilon error bound. Instead, this function assumes that each element in the sketch has a point estimate of its rank (the center of its rmin/rmax range), and looks for the element closest to the desired rank.
Definition at line 311 of file streaming_quantile_sketch.hpp.
|
inline |
Finalizes the sketch datastructure. Must be called before queries are made. Once finalize() is called, elements may no longer be inserted.
Definition at line 182 of file streaming_quantile_sketch.hpp.
|
inline |
Reinitializes the quantile sketch, resetting it. epsilon is the desired accuracy of the quantiles.
epsilon | The desired accuracy |
Definition at line 121 of file streaming_quantile_sketch.hpp.
|
inline |
Returns the approximate current memory usage of the sketch in bytes.
Definition at line 151 of file streaming_quantile_sketch.hpp.
|
inline |
Queries for the value at rank k of all the elements inserted. Assuming N elements are inserted. Rank 0 will be the minimum value, rank N-1, the maximum value.
Note that this rank is in relation to the total number of elements inserted. For instance,
sketch.query(9999) will be the maximum.
Definition at line 234 of file streaming_quantile_sketch.hpp.
|
inline |
Returns the value at a quantile. For instance:
quantile values below 0 will be interpreted as 0 (hence the minimum). quantile values above 1 will be interpreted as 1 (hence the maximum).
Definition at line 254 of file streaming_quantile_sketch.hpp.
|
inline |
Returns the number of elements inserted into the sketch.
Definition at line 207 of file streaming_quantile_sketch.hpp.
|
inline |
Returns the number of elements stored in the sketch
Definition at line 138 of file streaming_quantile_sketch.hpp.
|
inline |
Finalizes the sketch datastructure, in preparation for combining with other streaming quantile sketches. Once substream_finalize() is called, elements may no longer be inserted.
Definition at line 323 of file streaming_quantile_sketch.hpp.