Turi Create  4.0
turi::sarray_reader< T > Class Template Referenceabstract

#include <core/storage/sframe_data/sarray_reader.hpp>

Public Types

typedef sarray_iterator< T > iterator
 The iterator type which begin and end returns.
 
typedef iterator::value_type value_type
 The value type the sarray stores.
 

Public Member Functions

 sarray_reader ()=default
 
 sarray_reader (const sarray_reader &other)=delete
 Deleted Copy constructor.
 
sarray_readeroperator= (const sarray_reader &other)=delete
 Assignment operator.
 
void init (const sarray< T > &array, size_t num_segments=(size_t)(-1))
 
void init (const sarray< T > &array, const std::vector< size_t > &segment_lengths)
 
size_t num_segments () const
 
size_t segment_length (size_t segment) const
 
std::string get_index_file () const
 
std::vector< std::string > get_file_names () const
 
bool get_metadata (std::string key, std::string &val) const
 
std::pair< bool, std::string > get_metadata (std::string key) const
 
size_t size () const
 
iterator begin (size_t segmentid) const
 
iterator end (size_t segmentid) const
 
size_t read_rows (size_t row_start, size_t row_end, std::vector< T > &out_obj)
 
size_t read_rows (size_t row_start, size_t row_end, sframe_rows &out_obj)
 
void reset_iterators ()
 
flex_type_enum get_type () const
 
virtual size_t read_rows (size_t row_start, size_t row_end, std::vector< typename sarray_iterator< T > ::value_type > &out_obj)=0
 

Detailed Description

template<typename T>
class turi::sarray_reader< T >

The SArray reader provides a reading interface to an immutable, on disk, sequence of objects T.

The SArray is an immutable sequence of objects of type T, and is internally represented as a collection of files. The sequence is cut up into a collection of segments (not necessarily of equal length), where each segment covers a disjoint subset of the sequence. Each segment can then be read in parallel.

To read from an sarray<T> use sarray::get_reader():

auto reader = array.get_reader();

reader will be of type sarray_reader<T>

reader can then provide input iterators from segments via the begin() and end() functions.

Definition at line 33 of file gl_sarray.hpp.

Constructor & Destructor Documentation

◆ sarray_reader()

template<typename T>
turi::sarray_reader< T >::sarray_reader ( )
default

Default constructor. Does nothing. Use init()

Member Function Documentation

◆ begin()

template<typename T>
iterator turi::sarray_reader< T >::begin ( size_t  segmentid) const
inlinevirtual

Return the begin iterator of the segment. The iterator (sarray_iterator) is of the input iterator type and has value_type T. See end() to get the end iterator of the segment.

The iterator is invalid once the originating sarray is destroyed. Accessing the iterator after the sarray is destroyed is undefined behavior.

// example to print segment 1 to screen
auto iter = sarr.begin(1);
auto enditer =sarr.end(1);
while(iter != enditer) {
std::cout << *iter << "\n";
++iter;
}

Will throw an exception if the sarray is invalid (there is an error reading files) Also segmentid must be a valid segment ID. Will throw an exception otherwise.

Implements turi::siterable< sarray_iterator< T > >.

Definition at line 396 of file sarray_reader.hpp.

◆ end()

template<typename T>
iterator turi::sarray_reader< T >::end ( size_t  segmentid) const
inlinevirtual

Return the end iterator of the segment. The iterator (sarray_iterator) is of the input iterator type and has value_type T. See end() to get the end iterator of the segment.

The iterator is invalid once the originating sarray is destroyed. Accessing the iterator after the sarray is destroyed is undefined behavior.

// example to print segment 1 to screen
auto iter = sarr.begin(1);
auto enditer =sarr.end(1);
while(iter != enditer) {
std::cout << *iter << "\n";
++iter;
}

Will throw an exception if the sarray is invalid (there is an error reading files) Also segmentid must be a valid segment ID. Will throw an exception otherwise.

Implements turi::siterable< sarray_iterator< T > >.

Definition at line 429 of file sarray_reader.hpp.

◆ get_file_names()

template<typename T>
std::vector<std::string> turi::sarray_reader< T >::get_file_names ( ) const
inline

Returns the collection of files storing the sarray. For instance: [file_prefix].sidx, [file_prefix].0001, etc.

Definition at line 327 of file sarray_reader.hpp.

◆ get_index_file()

template<typename T>
std::string turi::sarray_reader< T >::get_index_file ( ) const
inline

Return the file prefix of the sarray (paramter on construction)

Definition at line 317 of file sarray_reader.hpp.

◆ get_metadata() [1/2]

template<typename T>
bool turi::sarray_reader< T >::get_metadata ( std::string  key,
std::string &  val 
) const
inline

Reads the value of a key associated with the sarray. Returns true on success, false on failure.

Definition at line 336 of file sarray_reader.hpp.

◆ get_metadata() [2/2]

template<typename T>
std::pair<bool, std::string> turi::sarray_reader< T >::get_metadata ( std::string  key) const
inline

Reads the value of a key associated with the sarray. Returns a pair of (true, value) on success, and (false, empty_string) on failure.

Definition at line 349 of file sarray_reader.hpp.

◆ get_type()

template<typename T>
flex_type_enum turi::sarray_reader< T >::get_type ( ) const
inline

Returns the type of the SArray (as set by swriter<flexible_type>::set_type). If the type of the SArray was not set, this returns flex_type_enum::UNDEFINED, in which case each row can be of arbitrary type.

This function should only be used for sarray<flexible_type> and will fail fatally otherwise.

Definition at line 497 of file sarray_reader.hpp.

◆ init() [1/2]

template<typename T>
void turi::sarray_reader< T >::init ( const sarray< T > &  array,
size_t  num_segments = (size_t)(-1) 
)
inline

Attempts to construct an sarray_iterator which reads from an existing sarray. If the index file cannot be opened, an exception is thrown.

Parameters
arrayThe array to read
num_segmentsIf num_segments == (size_t)(-1), the original file segmentation is used. Otherwise, the array is cut into num_segments number of logical segments which distribute the rows uniformly.

Definition at line 227 of file sarray_reader.hpp.

◆ init() [2/2]

template<typename T>
void turi::sarray_reader< T >::init ( const sarray< T > &  array,
const std::vector< size_t > &  segment_lengths 
)
inline

Attempts to construct an sarray_iterator which reads from an existing sarray and uses a segmentation defined by an argument. If the index file cannot be opened, an exception is thrown. If the sum of the lengths of all the segments do not add up to the length of the sarray, an exception is thrown

Parameters
arrayThe array to read
segment_lengthsAn array describing the lengths of each segment. This must sum up to the length of the array.

Definition at line 271 of file sarray_reader.hpp.

◆ num_segments()

template<typename T>
size_t turi::sarray_reader< T >::num_segments ( ) const
inlinevirtual

Return the number of segments in the collection. Will throw an exception if the sarray is invalid (there is an error reading files)

Implements turi::siterable< sarray_iterator< T > >.

Definition at line 299 of file sarray_reader.hpp.

◆ read_rows() [1/2]

virtual size_t turi::siterable< sarray_iterator< T > >::read_rows ( size_t  row_start,
size_t  row_end,
std::vector< typename sarray_iterator< T > ::value_type > &  out_obj 
)
pure virtualinherited

Reads a collection of rows, storing the result in out_obj. This function is independent of the begin/end iterator functions, and can be called anytime. This function is also fully concurrent.

Parameters
row_startFirst row to read
row_endone past the last row to read (i.e. EXCLUSIVE). row_end can be beyond the end of the array, in which case, fewer rows will be read.
out_objThe output array
Returns
Actual number of rows read. Return (size_t)(-1) on failure.
Note
This function is not always efficient. Different file formats implementations will have different characteristics.

◆ read_rows() [2/2]

template<typename T>
size_t turi::sarray_reader< T >::read_rows ( size_t  row_start,
size_t  row_end,
std::vector< T > &  out_obj 
)
inline

Reads a collection of rows, storing the result in out_obj. This function is independent of the open_segment/read_segment/close_segment functions, and can be called anytime. This function is also fully concurrent.

Parameters
row_startFirst row to read
row_endone past the last row to read (i.e. EXCLUSIVE). row_end can be beyond the end of the array, in which case, fewer rows will be read.
out_objThe output array
Returns
Actual number of rows read. Return (size_t)(-1) on failure.
Note
This function is not always efficient. Different file formats implementations will have different characteristics.

Definition at line 451 of file sarray_reader.hpp.

◆ reset_iterators()

template<typename T>
void turi::sarray_reader< T >::reset_iterators ( )
inlinevirtual

Resets all the file handles. All existing iterators are invalidated.

Implements turi::siterable< sarray_iterator< T > >.

Definition at line 481 of file sarray_reader.hpp.

◆ segment_length()

template<typename T>
size_t turi::sarray_reader< T >::segment_length ( size_t  segment) const
inlinevirtual

Return the number of rows in the segment. Will throw an exception if the sarray is invalid (there is an error reading files)

Implements turi::siterable< sarray_iterator< T > >.

Definition at line 309 of file sarray_reader.hpp.

◆ size()

template<typename T>
size_t turi::sarray_reader< T >::size ( ) const
inline

Returns the number of elements in the SArray

Definition at line 363 of file sarray_reader.hpp.


The documentation for this class was generated from the following files: