Turi Create
4.0
|
#include <ml/sketches/unity_sketch.hpp>
Public Member Functions | |
void | construct_from_sarray (std::shared_ptr< unity_sarray_base > uarray, bool background=false, const std::vector< flexible_type > &keys={}) |
bool | sketch_ready () |
size_t | num_elements_processed () |
double | get_quantile (double quantile) |
double | frequency_count (flexible_type value) |
std::vector< std::pair< flexible_type, size_t > > | frequent_items () |
double | num_unique () |
std::map< flexible_type, std::shared_ptr< unity_sketch_base > > | element_sub_sketch (const std::vector< flexible_type > &keys) |
std::shared_ptr< unity_sketch_base > | element_length_summary () |
std::shared_ptr< unity_sketch_base > | element_summary () |
std::shared_ptr< unity_sketch_base > | dict_key_summary () |
std::shared_ptr< unity_sketch_base > | dict_value_summary () |
double | mean () |
double | max () |
double | min () |
double | numeric_epsilon () |
double | sum () |
double | var () |
size_t | size () |
size_t | num_undefined () |
void | cancel () |
Provides a query interface to a collection of statistics about an SArray accumulated via various sketching methods. The unity_sketch object contains a summary of a single SArray (a column of an SFrame). It contains sketched statistics about the Array which can be queried efficiently.
The sketch computation is fast and has complexity approximately linear in the length of the Array. After which, all queryable functions in the sketch can be performed nearly instantly.
The sketch's contents vary depending on whether it is a numeric array, or a non-numeric (string) array, or list type list vector/dict/recursive If numeric:
This is essentially a union among a collection of sketches, depend on value type of SArray, here is what's availble in the sketch for each sarray value type: numeric type (int, float):
The following information is provided exactly:
And the following information is provided approximately:
For SArray of type recursive/dict/array, additional sketch information is available:
For SArray of type list, there is a sketch summary for all values inside the list element. Sketch summary flattens all list values and do a sketch summery over flattened values. Each value in list is casted to string for sketch summary. The summary can be retrieved by calling:
For SArray of type array(vector), there is a sketch summary for all values inside vector element. Sketch summary flattens all vector values and do a sketch summery over flattened values. The summary can be retrieved by calling:
For SArray of type dict, additional sketch summary over the keys and values are provided. They can be retrieved by calling:
For SArray of type dict, user can also pass in a list of dictionary keys to sketch_summary function, this would cause one sub sketch for each of the key. For example: >>> sketch = sa.sketch_summary(sub_sketch_keys=["a", "b"]) Then the sub summary may be retrieved by: >>> sketch.element_sub_sketch() Or: >>> sketch.element_sub_sketch(["key1", "key2"]) for subset of keys
Similarly, for SArray of type vector(array), user can also pass in a list of integers which is the index into the vector to get sub sketch For example: >>> sketch = sa.sketch_summary(sub_sketch_keys=[1,3,5]) Then the sub summary may be retrieved by: >>> sketch.element_sub_sketch() Or: >>> sketch.element_sub_sketch([1,3]) for subset of keys
Definition at line 136 of file unity_sketch.hpp.
void turi::unity_sketch::cancel | ( | ) |
Cancels any ongoing sketch computation.
void turi::unity_sketch::construct_from_sarray | ( | std::shared_ptr< unity_sarray_base > | uarray, |
bool | background = false , |
||
const std::vector< flexible_type > & | keys = {} |
||
) |
Generates all the sketch statistics from an input SArray. If background is true, the sketch will be constructed in the background. While the sketch is being constructed in a background thread, queries can be executed on the sketch, but none of the quality guarantees will apply.
std::shared_ptr<unity_sketch_base> turi::unity_sketch::dict_key_summary | ( | ) |
For SArray of dictionary type, returns the sketch summary for the dictionary keys It only counts the keys if the key can be converted to string
std::shared_ptr<unity_sketch_base> turi::unity_sketch::dict_value_summary | ( | ) |
For SArray of dictionary type, returns the sketch summary for the dictionary values It only counts the values if the value can be converted to float
std::shared_ptr<unity_sketch_base> turi::unity_sketch::element_length_summary | ( | ) |
Returns element length sketch summary if the sarray is a list/vector/dict type raises exception otherwise
std::map<flexible_type, std::shared_ptr<unity_sketch_base> > turi::unity_sketch::element_sub_sketch | ( | const std::vector< flexible_type > & | keys | ) |
Returns sketch summary for a given key in dictionary SArray sketch, or a given index in SArray of vector
key | is either an index into vector or a key in dictionary |
std::shared_ptr<unity_sketch_base> turi::unity_sketch::element_summary | ( | ) |
For SArray of array/list(recursive) type, returns the sketch summary for the list values the summary only works if element can be converted to string. Elements that cannot be converted to string will be ignored
double turi::unity_sketch::frequency_count | ( | flexible_type | value | ) |
Returns a sketched estimate of the number of occurances of a given element. This estimate is based on the count sketch. The element type must be of the same type as the input SArray; throws an exception otherwise.
std::vector<std::pair<flexible_type, size_t> > turi::unity_sketch::frequent_items | ( | ) |
Returns a sketched estimate of the most frequent elements in the SArray based on the SpaceSaving sketch. It is only guaranteed that all elements which appear in more than 0.01% (0.0001) rows of the array will appear in the set of returned elements. However, other elements may also appear in the result. The item counts are estimated using the CountSketch.
double turi::unity_sketch::get_quantile | ( | double | quantile | ) |
Returns a sketched estimate of the value at a particular quantile between 0.0 and 1.0. The quantile is guaranteed to be accurate within 1%: meaning that if you ask for the 0.55 quantile, the returned value is guaranteed to be between the true 0.54 quantile and the true 0.56 quantile. The quantiles are only defined for numeric arrays and this function will throw an exception if called on a sketch constructed for a non-numeric column.
|
inline |
Returns the max of the values in the sarray. Returns NaN on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 254 of file unity_sketch.hpp.
|
inline |
Returns the mean of the values in the sarray. Returns 0 on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 242 of file unity_sketch.hpp.
|
inline |
Returns the min of the values in the sarray. Returns NaN on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 266 of file unity_sketch.hpp.
size_t turi::unity_sketch::num_elements_processed | ( | ) |
Returns the number of elements processed by the sketch is complete. If the sketch is constructed with background == false, this will always return the number of elements of the array. If the sketch is constructed using a background thread this may return a value between 0 and the length of the array.
|
inline |
Returns the number of undefined elements in the input SArray.
Definition at line 319 of file unity_sketch.hpp.
double turi::unity_sketch::num_unique | ( | ) |
Returns a sketched estimate of the number of unique values in the SArray based on the Hyperloglog sketch.
|
inline |
Returns the epsilon value used by the numeric sketch. Returns NaN on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 278 of file unity_sketch.hpp.
|
inline |
Returns the number of elements in the input SArray.
Definition at line 312 of file unity_sketch.hpp.
bool turi::unity_sketch::sketch_ready | ( | ) |
Returns true if the sketch is complete. If the sketch is constructed with background == false, this will always return true. If not the sketch is constructed using a background thread and this will return false until the sketch is ready.
|
inline |
Returns the sum of the values in the sarray. Returns 0 on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 290 of file unity_sketch.hpp.
|
inline |
Returns the variance of the values in the sarray. Returns 0 on an empty array. Throws an exception if called on an sarray with non-numeric type.
Definition at line 301 of file unity_sketch.hpp.