Turi Create  4.0
turi::hash_bucket_container< T > Class Template Reference

#include <core/storage/sframe_data/groupby.hpp>

Public Member Functions

 hash_bucket_container (size_t num_buckets, comparator_type comparator=std::less< value_type >())
 Constructs a container with n buckets, and a comparator for sorting the values.
 

Detailed Description

template<typename T>
class turi::hash_bucket_container< T >

A container of a collection of "hash_bucket"s. Each hash_bucket store the value in sorted order. If the element is added to bucket by its hash_value, then all elements in the container are partially sorted, or grouped.

Below is an example of using the it to group an sframe by its first column.

typedef std::vector<flexible_type> valuetype;
sframe sf = ...;
hash_bucket_container<std::vector<flexible_type>> hash_container(
sf.num_segments(),
[](const value_type& a, const value_type& b) { return a[0] < b[0]; }
);
parallel_for(0, sf.num_segments(); [&](size_t i) {
auto iter = sf.get_reader().begin(i);
auto end = sf.get_reader().end(i);
while (iter != end) {
size_t hash = *iter[0].hash();
hash_container.add(*iter, hash % hash_container.num_buckets());
++iter;
}
});
sframe outsf;
hash_container.sort_and_write(outsf);

Each hash_bucket has an in memory buffer, and is backed by an sarray segment. When the buffer is full, it is sorted and written into the sarray segment as a sorted chunk.

The sort_and_write function then merges the sorted chunks and write out to a new sarray or sframe.

Definition at line 75 of file groupby.hpp.


The documentation for this class was generated from the following file: