Turi Create  4.0
turi::dataframe_t Struct Reference

#include <core/storage/sframe_data/dataframe.hpp>

Public Member Functions

void read_csv (const std::string &path, char delimiter, bool use_header)
 
size_t nrows () const
 
bool empty () const
 
void set_type (std::string key, flex_type_enum type)
 
size_t ncols () const
 Returns the number of columns in the dataframe.
 
bool contains (std::string key) const
 
bool contains_nan (std::string key) const
 
std::pair< flex_type_enum, std::vector< flexible_type > & > operator[] (std::string key)
 
std::pair< flex_type_enum, const std::vector< flexible_type > & > operator[] (std::string key) const
 
void print () const
 
void set_column (std::string key, const std::vector< flexible_type > &val, flex_type_enum type)
 
void set_column (std::string key, std::vector< flexible_type > &&val, flex_type_enum type)
 
void remove_column (std::string key)
 
void save (oarchive &oarc) const
 Serializer.
 
void load (iarchive &iarc)
 Deserializer.
 
void clear ()
 Clears the contents of the dataframe.
 

Public Attributes

std::vector< std::string > names
 A vector storing the name of columns.
 
std::map< std::string, flex_type_enumtypes
 A map from the column name to the type of the column.
 
std::map< std::string, std::vector< flexible_type > > values
 

Detailed Description

Type that represents a Pandas-like dataframe: A in memory column-wise representation of a table. The dataframe_t is simply a map from column name to a column of records, where every column is the same length, and all values within a column have the same type. This is also the type used for transferring between pandas dataframe objects and C++.

Each cell in the dataframe is represented by a flexible_type object, while this technically allows every cell to be an arbitrary type, we do not permit that behavior. We require and assume that every cell in a column be of the same type. This is with the exception of empty cells (NaNs in Pandas) which are of type UNDEFINED.

Definition at line 39 of file dataframe.hpp.

Member Function Documentation

◆ contains()

bool turi::dataframe_t::contains ( std::string  key) const
inline

Returns true if the dataframe contains a column with the given name.

Definition at line 94 of file dataframe.hpp.

◆ contains_nan()

bool turi::dataframe_t::contains_nan ( std::string  key) const
inline

Returns true if the column contains undefined flexible_type value.

Definition at line 101 of file dataframe.hpp.

◆ empty()

bool turi::dataframe_t::empty ( ) const
inline

Returns true if the dataframe is empty.

Definition at line 77 of file dataframe.hpp.

◆ nrows()

size_t turi::dataframe_t::nrows ( ) const
inline

Returns the number of rows in the dataframe

Definition at line 69 of file dataframe.hpp.

◆ operator[]() [1/2]

std::pair< flex_type_enum, std::vector<flexible_type>&> turi::dataframe_t::operator[] ( std::string  key)
inline

Column index operator. Can be used to extract a column from the dataframe. Returns a pair of (type, reference to column)

Definition at line 120 of file dataframe.hpp.

◆ operator[]() [2/2]

std::pair< flex_type_enum, const std::vector<flexible_type>&> turi::dataframe_t::operator[] ( std::string  key) const
inline

Const column index operator. Can be used to extract a column from the dataframe. Returns a pair of (type, reference to column)

Definition at line 129 of file dataframe.hpp.

◆ print()

void turi::dataframe_t::print ( ) const

Prints the contents of the dataframe to std::cerr

◆ read_csv()

void turi::dataframe_t::read_csv ( const std::string &  path,
char  delimiter,
bool  use_header 
)

Fill the dataframe with the content from a csv file.

Parameters

◆ remove_column()

void turi::dataframe_t::remove_column ( std::string  key)

Remove the column.

◆ set_column() [1/2]

void turi::dataframe_t::set_column ( std::string  key,
const std::vector< flexible_type > &  val,
flex_type_enum  type 
)

Sets the value of a column of the dataframe.

◆ set_column() [2/2]

void turi::dataframe_t::set_column ( std::string  key,
std::vector< flexible_type > &&  val,
flex_type_enum  type 
)

Sets the value of a column of the dataframe, consuming the vector value

◆ set_type()

void turi::dataframe_t::set_type ( std::string  key,
flex_type_enum  type 
)

Convert the values in the column into the specified type. Throws an exception if the column is not found, or the conversion cannot be made.

Member Data Documentation

◆ values

std::map<std::string, std::vector<flexible_type> > turi::dataframe_t::values

A map from the column name to the values of the column. Every column must have the same length, and all values within a column must be of the same type. The UNDEFINED type is an exception to the rule and may be used anywhere to designate an empty entry.

Definition at line 51 of file dataframe.hpp.


The documentation for this struct was generated from the following file: