Turi Create  4.0
SFrame Internal

Namespaces

 turi::v2_block_impl
 
 turi::sframe_saving_impl
 

Classes

class  turi::sarray_format_reader_common_base< T >
 
class  turi::sarray_group_format_writer< T >
 
class  turi::sarray_format_reader_v2< T >
 
class  turi::sarray_group_format_writer_v2< T >
 
struct  turi::index_file_information
 
struct  turi::group_index_file_information
 
class  turi::sarray_sorted_buffer< T >
 
struct  turi::sframe_saving_impl::column_blocks
 
class  turi::siterable< Iterator >
 
class  turi::swriter_base< Iterator >
 
class  turi::unfair_lock
 

Functions

index_file_information turi::read_index_file (std::string index_file)
 
group_index_file_information turi::read_array_group_index_file (std::string group_index_file)
 
void turi::write_array_group_index_file (std::string group_index_file, const group_index_file_information &info)
 
std::pair< std::string, size_t > turi::parse_v2_segment_filename (std::string fname)
 
void turi::sframe_saving_impl::advance_column_blocks_to_next_block (v2_block_impl::block_manager &block_manager, column_blocks &block)
 

Detailed Description

A more flexible sframe creator.

User can finely control the ratio of certain categories of data. For instance, 70% of 1s and 30% of 0s.

One can achieve that goal by recording the count of already generated data for each category inside of the functors.

Parameters
[in]column_namescolumn names
[in]column_typescolumn types
[in]nrowsnumber of rows.
[in]next_rowcallable, equivalent to std::function<std::vector<flexible_type>(size_t)>. if next_row is a function, it should be thread-safe. if next_row is a functor, it's not required to be thread-safe.

Creates a random SFrame for testing purposes. The column_type_info gives the types of the column.

Parameters
[in]n_rowsThe number of observations to run the timing on.
[in]column_type_infoA string with each character denoting one type of column. The legend is as follows:

n: numeric column. b: categorical column with 2 categories. z: categorical column with 5 categories. Z: categorical column with 10 categories. c: categorical column with 100 categories. C: categorical column with 1000000 categories. s: categorical column with short string keys and 1000 categories. S: categorical column with short string keys and 100000 categories. v: numeric vector with 10 elements. V: numeric vector with 1000 elements. u: categorical set with up to 10 elements. U: categorical set with up to 1000 elements. d: dictionary with 10 entries. D: dictionary with 100 entries. 1: 1d ndarray of dimension 10 2: 2d ndarray of dimension 4x3 3: 3d ndarray of dimension 4x3x2 4: 4d ndarray of dimension 4x3x2x2 A: 3d ndarray of dimension 4x3x2, randomized non-canonical striding.

Parameters
[in]create_target_columnIf true, then create a random target column called "target" as well.

Function Documentation

◆ advance_column_blocks_to_next_block()

void turi::sframe_saving_impl::advance_column_blocks_to_next_block ( v2_block_impl::block_manager block_manager,
column_blocks block 
)

Advances the column block to the next block.

◆ parse_v2_segment_filename()

std::pair<std::string, size_t> turi::parse_v2_segment_filename ( std::string  fname)

Splits a filename of the form [filename]:N into a pair of {filename, N}. If the filename is not of that form, or cannot be interpreted as that form, {filename, 0} is returned.

◆ read_array_group_index_file()

group_index_file_information turi::read_array_group_index_file ( std::string  group_index_file)

Reads an sarray group index file from disk. Raises an exception on failure.

An array_group is a group of sarrays in a single collection of files.

◆ read_index_file()

index_file_information turi::read_index_file ( std::string  index_file)

Reads an sarray index file from disk. This will automatically adapt to v1 and v2 index file formats.

  • If index_file is "xxx.sidx", and is a v1 format, it will be read as normal
  • If index_file is "xxx.sidx", and is a v2 format (i.e. array group), it will return the 1st column (column 0) of the group.
  • If index_file is "xxx.sidx:n", and is a v2 format (i.e. array group), it will return column n of the group. All other conditions will fail. Raise an exception on failure.

This function will also automatically de-relativize the sframe_index_file_information::column_files to get absolute paths

◆ write_array_group_index_file()

void turi::write_array_group_index_file ( std::string  group_index_file,
const group_index_file_information info 
)

Writes an sarray v2 index file to disk. Raises an exception on failure.

This function will also automatically relativize the sframe_index_file_information::column_files to get relative paths when writing to disk