Turi Create  4.0
CSV Parsing and Writing

Classes

struct  turi::csv_line_tokenizer
 
class  turi::csv_writer
 
struct  turi::dataframe_t
 
class  turi::dataframe_row_iterator
 
struct  turi::csv_file_handling_options
 

Functions

void turi::parallel_dataframe_iterate (const dataframe_t &df, std::function< void(dataframe_row_iterator &iter, size_t startrow, size_t endrow)> partialrowfn)
 
std::istream & turi::eol_safe_getline (std::istream &is, std::string &t)
 
std::map< std::string, std::shared_ptr< sarray< flexible_type > > > turi::parse_csvs_to_sframe (const std::string &url, csv_line_tokenizer &tokenizer, csv_file_handling_options options, sframe &frame, std::string frame_sidx_file="")
 

Detailed Description

Function Documentation

◆ eol_safe_getline()

std::istream& turi::eol_safe_getline ( std::istream &  is,
std::string &  t 
)

std::getline replacement that correctly handles all \r, \n and \r\n line break characters.

◆ parallel_dataframe_iterate()

void turi::parallel_dataframe_iterate ( const dataframe_t df,
std::function< void(dataframe_row_iterator &iter, size_t startrow, size_t endrow)>  partialrowfn 
)

Cuts up the provided begin iterator to a dataframe into rows, calling the lambda with a new iterator and the range of rows it is meant to process.

◆ parse_csvs_to_sframe()

std::map<std::string, std::shared_ptr<sarray<flexible_type> > > turi::parse_csvs_to_sframe ( const std::string &  url,
csv_line_tokenizer tokenizer,
csv_file_handling_options  options,
sframe frame,
std::string  frame_sidx_file = "" 
)

Parses a CSV file / glob of CSV files to an SFrame.

Parameters
urlPath or Glob to read files
tokenizerCSV tokenization options
optionsOther file handling options
frameReturned sframe object. This should be an uninitialized sframe.
frame_sidx_fileLocation to save the result. Optional. Defaults to cache.
Returns
a map of filename to sarray<flexible_type> of string type where each row contains a line of the file that failed to parse. This is only filled if options.store_errors = true