Turi Create  4.0
turi::sarray< T > Class Template Reference

#include <core/storage/sframe_data/sarray.hpp>

Public Types

typedef sarray_reader< T > reader_type
 The reader type.
 
typedef swriter_impl::output_iterator< T > iterator
 The iterator type which get_output_iterator returns.
 
typedef T value_type
 The type contained in the sarray.
 

Public Member Functions

 sarray ()=default
 
 sarray (sarray &&other)
 Move constructor.
 
 sarray (const sarray &other)
 Copy constructor.
 
sarrayoperator= (const sarray &other)
 Assignment operator.
 
sarrayoperator= (sarray &&other)
 
 sarray (std::string sidx_or_directory)
 
 sarray (const flexible_type &value, size_t size, size_t num_segments=SFRAME_DEFAULT_NUM_SEGMENTS, flex_type_enum type=flex_type_enum::UNDEFINED)
 
void open_for_read (index_file_information info)
 
void open_for_read (std::string sidx_file)
 
void open_for_write (size_t num_segments=SFRAME_DEFAULT_NUM_SEGMENTS, bool disable_padding=false)
 
void open_for_write (std::string sidx_file, size_t num_segments=SFRAME_DEFAULT_NUM_SEGMENTS)
 
bool is_opened_for_read () const
 
bool is_opened_for_write () const
 
std::string get_index_file () const
 
sarray_group_format_writer< T > * get_writer ()
 
bool get_metadata (std::string key, std::string &val) const
 
std::pair< bool, std::string > get_metadata (std::string key) const
 
size_t size () const
 
std::unique_ptr< reader_typeget_reader () const
 
std::unique_ptr< reader_typeget_reader (size_t num_segments) const
 
std::unique_ptr< reader_typeget_reader (const std::vector< size_t > &segment_lengths) const
 
size_t num_segments () const
 
size_t segment_length (size_t i) const
 
const index_file_information get_index_info () const
 
sarray append (const sarray &other) const
 
std::shared_ptr< sarrayclone (size_t nsegments=0) const
 
void save (oarchive &oarc) const
 
void load (iarchive &iarc)
 
void try_compact ()
 
bool set_num_segments (size_t numseg)
 
iterator get_output_iterator (size_t segmentid)
 
void close ()
 
bool set_metadata (std::string key, std::string val)
 
flex_type_enum get_type () const
 
void set_type (flex_type_enum type)
 
void set_segment (size_t segmentid, const std::string &segment_file, size_t segment_size)
 
void save (std::string index_file) const
 
template<>
sarray< flexible_type >::iterator get_output_iterator (size_t segmentid)
 Gets an output iterator to the specified segment.
 

Detailed Description

template<typename T>
class turi::sarray< T >

The SArray represents an immutable, on disk, sequence of objects T.

The SArray is an immutable sequence of objects of type T, and is internally represented as a collection of files. The sequence is cut up into a collection of segments (not necessarily of equal length), where each segment covers a disjoint subset of the sequence. Each segment can then be read in parallel. SArray is referenced on disk by a single ".sidx" file, which then has a list of file names, one file for each segment.

The SArray is write-once, read-many. The SArray can be opened for writing once, after which it is read-only.

To open an existing sarray on disk for reading:

sarray<int> array;
array.open_for_read("test.sidx");

Note that the type of the array on disk is NOT checked. (though, we probably should)

To open an sarray for writing:

sarray<int> array;
array.open_for_write(); // create an sarray backed with temporary files
//temporary files will be deleted when array goes out of scope
sarray<int> array;
array.open_for_write("test.sidx"); // create an sarray backed by real files

When the array is opened for writing, it can written into using get_output_iterator() , to get an output iterator into each segment.

// Gets the output iterator for the 3rd segment
auto iter = get_output_iterator(3);
// writes the value "5" into the segment
(*iter) = 5; ++iter;
// when done,
close(); // closes the write.

The get_output_iterator() function can be called concurrently, but each individual output iterator is not concurrent. After close() is called, the sarray becomes a read-only array, and is equivalent to having called array.open_for_read(...)

To read from the sarray, get_reader() is used.

auto reader = array.get_reader();

Each reader provides read access to the SArray. Multiple readers can be obtained, as each has its own distinct file handles which are closed as the reader goes out of scope. See the documentation for sarray_reader for details.

The sarray<flexible_type> has additional capabilities.

  • It has the functions set_type() and get_type() to set the run-time type of the stored values.
  • Writes to the sarray<flexible_type> type check against the type setted. The writes must match either the type set by set_type() or be UNDEFINED.
Note
The only guaranteed concurrent safe function is get_output_iterator. All other mutating functions are not guaranteed to be safe.

Definition at line 30 of file gl_sarray.hpp.

Constructor & Destructor Documentation

◆ sarray() [1/3]

template<typename T>
turi::sarray< T >::sarray ( )
default

default constructor; does nothing; use open_for_read or open_for_write after construction to read/create an sarray.

◆ sarray() [2/3]

template<typename T>
turi::sarray< T >::sarray ( std::string  sidx_or_directory)
inlineexplicit

Attempts to construct an sarray which reads from an sfrom the given file index file. If the index cannot be opened, an exception is thrown.

Definition at line 198 of file sarray.hpp.

◆ sarray() [3/3]

template<typename T>
turi::sarray< T >::sarray ( const flexible_type value,
size_t  size,
size_t  num_segments = SFRAME_DEFAULT_NUM_SEGMENTS,
flex_type_enum  type = flex_type_enum::UNDEFINED 
)
inline

Create an sarray of given value and size.

Definition at line 205 of file sarray.hpp.

Member Function Documentation

◆ append()

template<typename T>
sarray turi::sarray< T >::append ( const sarray< T > &  other) const
inline

Appends another SArray of the same type with the current SArray, returning a new sarray. without destroying the other array. Both SArrays can be empty, but cannot be opened for writing.

Definition at line 458 of file sarray.hpp.

◆ clone()

template<typename T>
std::shared_ptr<sarray> turi::sarray< T >::clone ( size_t  nsegments = 0) const
inline

Return a new sarray that contains a copy of the data in the current array.

Definition at line 488 of file sarray.hpp.

◆ close()

template<typename T>
void turi::sarray< T >::close ( )
inlinevirtual

Closes the array. Array must be first opened for writing. close() also implicitly closes all segments. After the writer is closed, no segments can be written. Only once the array is closed, the SArray becomes readable with the get_reader() function.

Implements turi::swriter_base< swriter_impl::output_iterator< T > >.

Definition at line 605 of file sarray.hpp.

◆ get_index_file()

template<typename T>
std::string turi::sarray< T >::get_index_file ( ) const
inline

Return the location of the index file of the sarray

Definition at line 340 of file sarray.hpp.

◆ get_index_info()

template<typename T>
const index_file_information turi::sarray< T >::get_index_info ( ) const
inline

Returns all the index information of the array.

Definition at line 448 of file sarray.hpp.

◆ get_metadata() [1/2]

template<typename T>
bool turi::sarray< T >::get_metadata ( std::string  key,
std::string &  val 
) const
inline

Reads the value of a key associated with the sarray. Returns true on success, false on failure.

Definition at line 359 of file sarray.hpp.

◆ get_metadata() [2/2]

template<typename T>
std::pair<bool, std::string> turi::sarray< T >::get_metadata ( std::string  key) const
inline

Reads the value of a key associated with the sarray. Returns a pair of (true, value) on success, and (false, empty_string) on failure.

Definition at line 370 of file sarray.hpp.

◆ get_output_iterator()

template<typename T >
sarray< T >::iterator turi::sarray< T >::get_output_iterator ( size_t  segmentid)
inlinevirtual

Return an output iterator which can be used to write data to the segment. Array must be first opened for writing. The iterator (iterator) is of the output iterator type and has value_type T.

The iterator is invalid once the segment is closed (See close). Accessing the iterator after the writer is destroyed is undefined behavior.

// example to write a little array to segment 1
// say sw is of type sarray<int>
auto iter = sw.get_output_iterator(1);
std::vector<int> vals{1,2,3}
auto(int i, vals) {
*iter = i;
++iter;
}

Will throw an exception if the array is invalid (there is an error opening/writing files) Also segmentid must be a valid segment ID. Will throw an exception otherwise.

When T is a flexible_type, the output iterator performs type checking.

Implements turi::swriter_base< swriter_impl::output_iterator< T > >.

Definition at line 756 of file sarray.hpp.

◆ get_reader() [1/3]

template<typename T>
std::unique_ptr<reader_type> turi::sarray< T >::get_reader ( ) const
inline

Gets an sarray reader object using the segmentation produced by the actual file segments on disk.

Definition at line 396 of file sarray.hpp.

◆ get_reader() [2/3]

template<typename T>
std::unique_ptr<reader_type> turi::sarray< T >::get_reader ( size_t  num_segments) const
inline

Gets an sarray reader object with num_segments number of logical segments.

Definition at line 407 of file sarray.hpp.

◆ get_reader() [3/3]

template<typename T>
std::unique_ptr<reader_type> turi::sarray< T >::get_reader ( const std::vector< size_t > &  segment_lengths) const
inline

Gets an sarray reader object with a custom segment layout. segment_lengths must sum up to the same length as the original array.

Definition at line 419 of file sarray.hpp.

◆ get_type()

template<typename T>
flex_type_enum turi::sarray< T >::get_type ( ) const
inline

Returns the type of the SArray (as set by swriter<flexible_type>::set_type). If the type of the SArray was not set, this returns flex_type_enum::UNDEFINED, in which case each row can be of arbitrary type.

This function should only be used for sarray<flexible_type> and will fail fatally otherwise.

Definition at line 643 of file sarray.hpp.

◆ get_writer()

template<typename T>
sarray_group_format_writer<T>* turi::sarray< T >::get_writer ( )
inline

Return the underlying writer of the sarray

Definition at line 348 of file sarray.hpp.

◆ is_opened_for_read()

template<typename T>
bool turi::sarray< T >::is_opened_for_read ( ) const
inline

Returns true if the Array is opened for reading. i.e. get_reader() will succeed

Definition at line 324 of file sarray.hpp.

◆ is_opened_for_write()

template<typename T>
bool turi::sarray< T >::is_opened_for_write ( ) const
inline

Returns true if the Array is opened for writing. i.e. get_output_iterator() will succeed

Definition at line 333 of file sarray.hpp.

◆ load()

template<typename T>
void turi::sarray< T >::load ( iarchive iarc)
inline

SArray deserializer. iarc must be associated with a directory. Loads from the next prefix inside the directory.

Definition at line 523 of file sarray.hpp.

◆ num_segments()

template<typename T>
size_t turi::sarray< T >::num_segments ( ) const
inlinevirtual

Return the number of segments in the array.

Implements turi::swriter_base< swriter_impl::output_iterator< T > >.

Definition at line 431 of file sarray.hpp.

◆ open_for_read() [1/2]

template<typename T>
void turi::sarray< T >::open_for_read ( index_file_information  info)
inline

Initializes the SArray with an index info. If the SArray is already inited, this will throw an exception

Definition at line 232 of file sarray.hpp.

◆ open_for_read() [2/2]

template<typename T>
void turi::sarray< T >::open_for_read ( std::string  sidx_file)
inline

Initializes the SArray with an index file. If the SArray is already inited, this will throw an exception

Definition at line 253 of file sarray.hpp.

◆ open_for_write() [1/2]

template<typename T>
void turi::sarray< T >::open_for_write ( size_t  num_segments = SFRAME_DEFAULT_NUM_SEGMENTS,
bool  disable_padding = false 
)
inline

Opens the Array for writing with an arbitrary temporary file. The array must not already been inited.

Parameters
num_segmentsThe number of segments in the array

Definition at line 280 of file sarray.hpp.

◆ open_for_write() [2/2]

template<typename T>
void turi::sarray< T >::open_for_write ( std::string  sidx_file,
size_t  num_segments = SFRAME_DEFAULT_NUM_SEGMENTS 
)
inline

Opens the Array for writing with a location on disk. The array must not already been inited.

Parameters
sidx_fileIf not specified, an argitrary temporary file will be created. Otherwise, all frame files will be written to the same location as the frame_sidx_file. Must end in ".sidx"
num_segmentsThe number of segments in the array

Definition at line 307 of file sarray.hpp.

◆ operator=()

template<typename T>
sarray& turi::sarray< T >::operator= ( sarray< T > &&  other)
inline

Move assignment. Moves other into this. Other will be cleared as if it is a newly constructed sarray object.

Definition at line 173 of file sarray.hpp.

◆ save() [1/2]

template<typename T>
void turi::sarray< T >::save ( oarchive oarc) const
inline

Sarray serializer. iarc must be associated with a directory. Saves into a prefix inside the directory.

Definition at line 514 of file sarray.hpp.

◆ save() [2/2]

template<typename T>
void turi::sarray< T >::save ( std::string  index_file) const
inline

Saves a copy of the current sarray into a different location. Does not modify the current sarray.

Definition at line 693 of file sarray.hpp.

◆ segment_length()

template<typename T>
size_t turi::sarray< T >::segment_length ( size_t  i) const
inline

Return the length of segment i in the array.

Definition at line 440 of file sarray.hpp.

◆ set_metadata()

template<typename T>
bool turi::sarray< T >::set_metadata ( std::string  key,
std::string  val 
)
inline

Adds meta data to the array. Array must be first opened for writing.

Definition at line 619 of file sarray.hpp.

◆ set_num_segments()

template<typename T>
bool turi::sarray< T >::set_num_segments ( size_t  numseg)
inlinevirtual

Sets the number of segments in the output. Array must be first opened for writing. If any writes has occured prior to this, those writes will be lost. Returns true on sucess, false on failure.

Implements turi::swriter_base< swriter_impl::output_iterator< T > >.

Definition at line 552 of file sarray.hpp.

◆ set_segment()

template<typename T>
void turi::sarray< T >::set_segment ( size_t  segmentid,
const std::string &  segment_file,
size_t  segment_size 
)
inline

Set the writer index_info for a given segment. This function can be called, when the actual segment writing is done by other logics.

Definition at line 681 of file sarray.hpp.

◆ set_type()

template<typename T>
void turi::sarray< T >::set_type ( flex_type_enum  type)
inline

Sets the internal type of the flexible_type when written. All writes will cast to this type.

This function should only be used for sarray<flexible_type> and will fail fatally otherwise.

Definition at line 664 of file sarray.hpp.

◆ size()

template<typename T>
size_t turi::sarray< T >::size ( ) const
inline

Returns the number of elements in the SArray

Definition at line 382 of file sarray.hpp.

◆ try_compact()

template<typename T>
void turi::sarray< T >::try_compact ( )
inline

Attempts to compact if the number of segments in the SArray exceeds SFRAME_COMPACTION_THRESHOLD.

Definition at line 532 of file sarray.hpp.


The documentation for this class was generated from the following files: