Turi Create  4.0
turi::groupby_aggregate_impl::group_aggregate_container Class Reference

#include <core/storage/sframe_data/groupby_aggregate_impl.hpp>

Public Member Functions

 group_aggregate_container (size_t max_buffer_size, size_t num_segments)
 
 group_aggregate_container (const group_aggregate_container &other)=delete
 Deleted copy constructor.
 
group_aggregate_containeroperator= (const group_aggregate_container &other)=delete
 Deleted assignment operator.
 
void define_group (std::vector< size_t > column_numbers, std::shared_ptr< group_aggregate_value > aggregator)
 
void add (const std::vector< flexible_type > &val, size_t num_keys)
 Add a new element to the container.
 
void add (const sframe_rows::row &val, size_t num_keys)
 Add a new element to the container.
 
void group_and_write (sframe &out)
 Sort all elements in the container and writes to the output.
 
void init_tls ()
 init tls data members
 

Detailed Description

This maintains the complete aggregation result for an SFrame.

This class essentially implements the entire groupby aggregation algorithm. Aggregation groups are defined using define_group. After which, rows are inserted using the add method.

num_segments is the maximum degree of parallelism permissible. This is the number of segments of the output SFrame.

Keys are first hashed into a segment, and each segment is then executed independently. For each segment, rows are aggregated in memory as much as possible until we exceed the max_buffer_size number of keys stored in memory. If more keys are needed the existing keys are all sorted and flushed out to disk. This process repeats until all data is read.

Then when group_and_write is called, a k-way merge is performed across all the sorted ranges of keys on disk to write the final output.

Definition at line 196 of file groupby_aggregate_impl.hpp.

Constructor & Destructor Documentation

◆ group_aggregate_container()

turi::groupby_aggregate_impl::group_aggregate_container::group_aggregate_container ( size_t  max_buffer_size,
size_t  num_segments 
)
Parameters
max_buffer_sizeMaximum number of aggregators to store in memory per segment
num_segmentsMaximum degree of parallelism

Member Function Documentation

◆ define_group()

void turi::groupby_aggregate_impl::group_aggregate_container::define_group ( std::vector< size_t >  column_numbers,
std::shared_ptr< group_aggregate_value aggregator 
)

Adds a new group operation which groups the values of a column


The documentation for this class was generated from the following file: