Turi Create
4.0
|
Namespaces | |
turi::v2_block_impl | |
turi::sframe_saving_impl | |
Functions | |
index_file_information | turi::read_index_file (std::string index_file) |
group_index_file_information | turi::read_array_group_index_file (std::string group_index_file) |
void | turi::write_array_group_index_file (std::string group_index_file, const group_index_file_information &info) |
std::pair< std::string, size_t > | turi::parse_v2_segment_filename (std::string fname) |
void | turi::sframe_saving_impl::advance_column_blocks_to_next_block (v2_block_impl::block_manager &block_manager, column_blocks &block) |
A more flexible sframe creator.
User can finely control the ratio of certain categories of data. For instance, 70% of 1s and 30% of 0s.
One can achieve that goal by recording the count of already generated data for each category inside of the functors.
[in] | column_names | column names |
[in] | column_types | column types |
[in] | nrows | number of rows. |
[in] | next_row | callable, equivalent to std::function<std::vector<flexible_type>(size_t)> . if next_row is a function, it should be thread-safe. if next_row is a functor, it's not required to be thread-safe. |
Creates a random SFrame for testing purposes. The column_type_info gives the types of the column.
[in] | n_rows | The number of observations to run the timing on. |
[in] | column_type_info | A string with each character denoting one type of column. The legend is as follows: |
n: numeric column. b: categorical column with 2 categories. z: categorical column with 5 categories. Z: categorical column with 10 categories. c: categorical column with 100 categories. C: categorical column with 1000000 categories. s: categorical column with short string keys and 1000 categories. S: categorical column with short string keys and 100000 categories. v: numeric vector with 10 elements. V: numeric vector with 1000 elements. u: categorical set with up to 10 elements. U: categorical set with up to 1000 elements. d: dictionary with 10 entries. D: dictionary with 100 entries. 1: 1d ndarray of dimension 10 2: 2d ndarray of dimension 4x3 3: 3d ndarray of dimension 4x3x2 4: 4d ndarray of dimension 4x3x2x2 A: 3d ndarray of dimension 4x3x2, randomized non-canonical striding.
[in] | create_target_column | If true, then create a random target column called "target" as well. |
void turi::sframe_saving_impl::advance_column_blocks_to_next_block | ( | v2_block_impl::block_manager & | block_manager, |
column_blocks & | block | ||
) |
Advances the column block to the next block.
std::pair<std::string, size_t> turi::parse_v2_segment_filename | ( | std::string | fname | ) |
Splits a filename of the form [filename]:N into a pair of {filename, N}. If the filename is not of that form, or cannot be interpreted as that form, {filename, 0} is returned.
group_index_file_information turi::read_array_group_index_file | ( | std::string | group_index_file | ) |
Reads an sarray group index file from disk. Raises an exception on failure.
An array_group is a group of sarrays in a single collection of files.
index_file_information turi::read_index_file | ( | std::string | index_file | ) |
Reads an sarray index file from disk. This will automatically adapt to v1 and v2 index file formats.
This function will also automatically de-relativize the sframe_index_file_information::column_files to get absolute paths
void turi::write_array_group_index_file | ( | std::string | group_index_file, |
const group_index_file_information & | info | ||
) |
Writes an sarray v2 index file to disk. Raises an exception on failure.
This function will also automatically relativize the sframe_index_file_information::column_files to get relative paths when writing to disk