Turi Create  4.0
turi::v2::ml_data_internal::column_indexer Class Referenceabstract

#include <toolkits/ml_data_2/indexing/column_indexer.hpp>

Public Member Functions

 column_indexer ()
 
virtual void initialize ()=0
 
virtual size_t map_value_to_index (size_t thread_idx, const flexible_type &feature)=0
 
virtual size_t immutable_map_value_to_index (const flexible_type &feature) const =0
 
virtual void insert_values_into_index (const std::vector< flexible_type > &features)
 
virtual void finalize ()=0
 
virtual flexible_type map_index_to_value (size_t idx) const
 
virtual std::set< flex_type_enumextract_key_types () const
 
virtual size_t indexed_column_size () const =0
 
virtual size_t get_version () const =0
 
virtual void save_impl (turi::oarchive &oarc) const =0
 
virtual void load_version (turi::iarchive &iarc, size_t version)=0
 
virtual std::function< flexible_type(const flexible_type &)> deindexing_lambda () const =0
 
virtual std::function< flexible_type(const flexible_type &)> indexing_lambda () const =0
 
virtual std::shared_ptr< column_indexercreate_cleared_copy () const =0
 
virtual void set_values (std::vector< flexible_type > &&values)=0
 

Static Public Member Functions

static std::shared_ptr< column_indexerfactory_create (const std::map< std::string, variant_type > &creation_options)
 

Public Attributes

std::string column_name
 
ml_column_mode mode
 
flex_type_enum original_column_type
 
std::map< std::string, flexible_typeoptions
 

Detailed Description

COMMENT.

column_metadata contains "meta data" concerning indexing of a single column of an SFrame. A collection of meta_data column objects is "all" the metadata required in the ml_data container.

Definition at line 27 of file column_indexer.hpp.

Constructor & Destructor Documentation

◆ column_indexer()

turi::v2::ml_data_internal::column_indexer::column_indexer ( )
inline

Default constructor.

Definition at line 33 of file column_indexer.hpp.

Member Function Documentation

◆ create_cleared_copy()

virtual std::shared_ptr<column_indexer> turi::v2::ml_data_internal::column_indexer::create_cleared_copy ( ) const
pure virtual

Create a copy with the index cleared.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ deindexing_lambda()

virtual std::function<flexible_type(const flexible_type&)> turi::v2::ml_data_internal::column_indexer::deindexing_lambda ( ) const
pure virtual

Returns a lambda function that can be used as a lambda function for deindexing a column.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ extract_key_types()

virtual std::set<flex_type_enum> turi::v2::ml_data_internal::column_indexer::extract_key_types ( ) const
inlinevirtual

Calculates the type of the values held in the index. This may be different from original_column_type – if the original_column_type is a DICT or LIST, this will return the actual type of the values. If the values are inconsistent, then an error is raised.

This method is useful when a metadata built with a dictionary is also used to map simple categorical variables.

Reimplemented in turi::v2::ml_data_internal::column_unique_indexer.

Definition at line 108 of file column_indexer.hpp.

◆ factory_create()

static std::shared_ptr<column_indexer> turi::v2::ml_data_internal::column_indexer::factory_create ( const std::map< std::string, variant_type > &  creation_options)
static

The factory method for loading and instantiating the proper class

◆ finalize()

virtual void turi::v2::ml_data_internal::column_indexer::finalize ( )
pure virtual

Call this when all calls to map_value_to_index are completed.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ get_version()

virtual size_t turi::v2::ml_data_internal::column_indexer::get_version ( ) const
pure virtual

Returns the current version used for the serialization.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ immutable_map_value_to_index()

virtual size_t turi::v2::ml_data_internal::column_indexer::immutable_map_value_to_index ( const flexible_type feature) const
pure virtual

Returns the index associated with the "feature" value.

Note
Only used if is_categorical is true.

If the value in the feature column was already seen, then the index already associated with that value is returned. If not, size_t(-1) is returned.

Parameters
[in]featureThe value in the feature column to map to the index.
Returns
An index associated with the given value. If the index is not present. We return size_t(-1).

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ indexed_column_size()

virtual size_t turi::v2::ml_data_internal::column_indexer::indexed_column_size ( ) const
pure virtual

Returns the size of the column – e.g. the number of distinct categories, or the size of the hash space. Only called if the column is indeed indexed, i.e. if mode_is_indexed(mode) is true.

Categorical : # Unique categories

Returns
Column size.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ indexing_lambda()

virtual std::function<flexible_type(const flexible_type&)> turi::v2::ml_data_internal::column_indexer::indexing_lambda ( ) const
pure virtual

Returns a lambda function that can be used as a lambda function for indexing a column.

Does not add any new index values.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ initialize()

virtual void turi::v2::ml_data_internal::column_indexer::initialize ( )
pure virtual

Initialize the index mapping and setup. There are certain internal parallel things that need to be set up before map_value_to_index works. Call this before looping over map_value_to_index, then call finish_indexing() when done.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ insert_values_into_index()

virtual void turi::v2::ml_data_internal::column_indexer::insert_values_into_index ( const std::vector< flexible_type > &  features)
inlinevirtual

Some of the ml_data tests currently depend on the order of insertion into the index, which is now done in parallel and thus not deterministic. This function allows the user to remove that randomness by inserting all indices in a specified order.

NOTE: This function is not thread safe; only call it from one thread.

Reimplemented in turi::v2::ml_data_internal::column_unique_indexer.

Definition at line 81 of file column_indexer.hpp.

◆ load_version()

virtual void turi::v2::ml_data_internal::column_indexer::load_version ( turi::iarchive iarc,
size_t  version 
)
pure virtual

Load the object.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ map_index_to_value()

virtual flexible_type turi::v2::ml_data_internal::column_indexer::map_index_to_value ( size_t  idx) const
inlinevirtual

Returns the feature "value" associated an index.

Note
Only used if is_categorical is true.
Parameters

Reimplemented in turi::v2::ml_data_internal::column_unique_indexer.

Definition at line 94 of file column_indexer.hpp.

◆ map_value_to_index()

virtual size_t turi::v2::ml_data_internal::column_indexer::map_value_to_index ( size_t  thread_idx,
const flexible_type feature 
)
pure virtual

Returns the index associated with the "feature" value.

Note
Only used if is_categorical is true.

If the value in the feature column was already seen, then the index already associated with that value is returned. If not, a new unique index is added and associated with this feature value.

This method is completely threadsafe and is meant to be called by multiple threads in contention.

Parameters
[in]featureThe value in the feature column to map to the index.
Returns
An index (possibly new) associated with the given value.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ save_impl()

virtual void turi::v2::ml_data_internal::column_indexer::save_impl ( turi::oarchive oarc) const
pure virtual

Serialize the object (save).

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

◆ set_values()

virtual void turi::v2::ml_data_internal::column_indexer::set_values ( std::vector< flexible_type > &&  values)
pure virtual

Set data directly.

Implemented in turi::v2::ml_data_internal::column_unique_indexer.

Member Data Documentation

◆ column_name

std::string turi::v2::ml_data_internal::column_indexer::column_name

The name of the column.

Definition at line 174 of file column_indexer.hpp.

◆ mode

ml_column_mode turi::v2::ml_data_internal::column_indexer::mode

The mode of the column;

Definition at line 178 of file column_indexer.hpp.

◆ options

std::map<std::string, flexible_type> turi::v2::ml_data_internal::column_indexer::options

A map of the options passed in to ml_data. May include options for the indexers.

Definition at line 187 of file column_indexer.hpp.

◆ original_column_type

flex_type_enum turi::v2::ml_data_internal::column_indexer::original_column_type

Original column type

Definition at line 182 of file column_indexer.hpp.


The documentation for this class was generated from the following file: