Turi Create
4.0
|
Namespaces | |
turi::groupby_aggregate_impl | |
turi::groupby_operators | |
turi::rolling_aggregate | |
Classes | |
class | turi::group_aggregate_value |
class | turi::hash_bucket< T > |
class | turi::hash_bucket_container< T > |
Functions | |
std::shared_ptr< group_aggregate_value > | turi::get_builtin_group_aggregator (const std::string &) |
sframe | turi::group (sframe sframe_in, std::string key_column) |
sframe | turi::groupby_aggregate (const sframe &source, const std::vector< std::string > &keys, const std::vector< std::string > &group_output_columns, const std::vector< std::pair< std::vector< std::string >, std::shared_ptr< group_aggregate_value >>> &groups, size_t max_buffer_size=SFRAME_GROUPBY_BUFFER_NUM_ROWS) |
std::shared_ptr< sarray< flexible_type > > | turi::rolling_aggregate::rolling_apply (const sarray< flexible_type > &input, std::shared_ptr< group_aggregate_value > agg_op, ssize_t window_start, ssize_t window_end, size_t min_observations) |
template<typename Iterator > | |
flexible_type | turi::rolling_aggregate::full_window_aggregate (std::shared_ptr< group_aggregate_value > agg_op, Iterator first, Iterator last) |
Aggregate functions. | |
template<typename Iterator > | |
bool | turi::rolling_aggregate::has_min_observations (size_t min_observations, Iterator first, Iterator last) |
Hash function.
This allows us to add groupby_element to an std::unordered_set
std::shared_ptr<group_aggregate_value> turi::get_builtin_group_aggregator | ( | const std::string & | ) |
Helper function to convert string aggregator name into builtin aggregator value.
Implementation is in groupby_operators.hpp
Group the sframe rows by the key_column.
Like a sort, but not.
sframe turi::groupby_aggregate | ( | const sframe & | source, |
const std::vector< std::string > & | keys, | ||
const std::vector< std::string > & | group_output_columns, | ||
const std::vector< std::pair< std::vector< std::string >, std::shared_ptr< group_aggregate_value >>> & | groups, | ||
size_t | max_buffer_size = SFRAME_GROUPBY_BUFFER_NUM_ROWS |
||
) |
Groupby Aggregate function for an SFrame. Given the source SFrame this function performs a group-by aggregate of the SFrame, using one or more columns to define the group key, and a descriptor for how to aggregate other non-key columns.
For instance given an SFrame:
* user_id movie_id rating time * 5 10 1 4pm * 5 15 2 1pm * 6 12 1 2pm * 7 13 1 3am *
will generate groups based on the user_id column, and within each group, count the movie_id, and sum the ratings.
* user_id "Count of movie_id" "Sum of rating" * 5 2 3 * 6 1 1 * 7 1 1 *
See groupby_aggregate_operators for operators that have been implemented.
A group is basically a pair of column-name and the operator. The column name can be any existing column in the table (there is no restriction. You can group on user_id and aggregate on user_id, though the result is typically not very meaningful). A special column name with the empty string "" is also defined in which case, the aggregator will be sent a flexible type of type FLEX_UNDEFINED for every row (this is useful for COUNT).
source | The input SFrame to group |
keys | An array of column names to generate the group on |
group_output_columns | The output column names for each aggregate. This must be the same length as the 'groups' parameter. Output column names must be unique and must not share similar column names as keys. If there are any empty entries, their values will be automatically assigned. |
groups | A collection of {column_names, group operator} pairs describing the aggregates to generate. You can have multiple aggregators for each set of columns. You do not need every column in the source to be represented. This must be the same length as the 'group_output_columns' parameter. |
max_buffer_size | The maximum size of intermediate aggregation buffers |
bool turi::rolling_aggregate::has_min_observations | ( | size_t | min_observations, |
Iterator | first, | ||
Iterator | last | ||
) |
Scans the current window to check for the number of non-NULL values.
Returns true if the number of non-NULL values is >= min_observations, false otherwise.
Definition at line 84 of file rolling_aggregate.hpp.
std::shared_ptr<sarray<flexible_type> > turi::rolling_aggregate::rolling_apply | ( | const sarray< flexible_type > & | input, |
std::shared_ptr< group_aggregate_value > | agg_op, | ||
ssize_t | window_start, | ||
ssize_t | window_end, | ||
size_t | min_observations | ||
) |
Apply an aggregate function over a moving window.
input | The input SArray (expects to be materialized) |
agg_op | The aggregator. These classes are the same as used by groupby. |
window_start | The start of the moving window relative to the current value being calculated, inclusive. For example, 2 values behind the current would be -2, and 0 indicates that the start of the window is the current value. |
window_end | The end of the moving window relative to the current value being calculated, inclusive. Must be greater than window_start . For example, 0 would indicate that the current value is the end of the window, and 2 would indicate that the window ends at 2 data values after the current. |
min_observations | The minimum allowed number of non-NULL values in the moving window for the emitted value to be non-NULL. size_t(-1) indicates that all values must be non-NULL. |
Returns an SArray of the same length as the input, with a type that matches the type output by the aggregation function.
Throws an exception if: