Turi Create  4.0
turi::ml_data_internal::column_indexer Class Reference

#include <ml/ml_data/column_indexer.hpp>

Public Member Functions

 column_indexer (std::string column_name, ml_column_mode mode, flex_type_enum original_column_type)
 
 column_indexer (const column_indexer &)=delete
 
void initialize ()
 
size_t map_value_to_index (size_t thread_idx, const flexible_type &feature) GL_HOT
 
size_t immutable_map_value_to_index (const flexible_type &feature) const
 
void insert_values_into_index (const std::vector< flexible_type > &features)
 
void finalize ()
 
const flexible_typemap_index_to_value (size_t idx) const
 
std::set< flex_type_enumextract_key_types () const
 
size_t indexed_column_size () const
 
size_t get_version () const
 
void save_impl (turi::oarchive &oarc) const
 
void load_version (turi::iarchive &iarc, size_t version)
 
std::vector< flexible_typereset_and_return_values ()
 
void set_indices (std::vector< flexible_type > &&values)
 
void debug_check_is_internally_consistent () const
 

Detailed Description

column_metadata contains "meta data" concerning indexing of a single column of an SFrame. A collection of meta_data column objects is "all" the metadata required in the ml_data container.

Definition at line 44 of file column_indexer.hpp.

Constructor & Destructor Documentation

◆ column_indexer() [1/2]

turi::ml_data_internal::column_indexer::column_indexer ( std::string  column_name,
ml_column_mode  mode,
flex_type_enum  original_column_type 
)

Default constructor; does nothing;

◆ column_indexer() [2/2]

turi::ml_data_internal::column_indexer::column_indexer ( const column_indexer )
delete

Copy constructor: Don't want to risk making copies of this.

Member Function Documentation

◆ debug_check_is_internally_consistent()

void turi::ml_data_internal::column_indexer::debug_check_is_internally_consistent ( ) const

Checks that the indices are equal across machines.

◆ extract_key_types()

std::set<flex_type_enum> turi::ml_data_internal::column_indexer::extract_key_types ( ) const

Calculates the type of the values held in the index. This may be different from original_column_type – if the original_column_type is a DICT or LIST, this will return a set of the actual types present. If the values are inconsistent, then an error is raised.

This method is useful when a metadata built with a dictionary is also used to map simple categorical variables.

◆ finalize()

void turi::ml_data_internal::column_indexer::finalize ( )

Call this when all calls to map_value_to_index are completed.

◆ get_version()

size_t turi::ml_data_internal::column_indexer::get_version ( ) const

Returns the current version used for the serialization.

◆ immutable_map_value_to_index()

size_t turi::ml_data_internal::column_indexer::immutable_map_value_to_index ( const flexible_type feature) const
inline

Returns the index associated with the "feature" value.

Note
Only used if is_categorical is true.

If the value in the feature column was already seen, then the index already associated with that value is returned. If not, size_t(-1) is returned.

Parameters
[in]featureThe value in the feature column to map to the index.
Returns
An index associated with the given value. If the index is not present. We return size_t(-1).

Definition at line 143 of file column_indexer.hpp.

◆ indexed_column_size()

size_t turi::ml_data_internal::column_indexer::indexed_column_size ( ) const
inline

Returns the size of the column.

Numeric : 1 Categorical : # Unique categories Vector : Size of the vector.

Returns
Column size.

Definition at line 242 of file column_indexer.hpp.

◆ initialize()

void turi::ml_data_internal::column_indexer::initialize ( )

Initialize the index mapping and setup. There are certain internal parallel things that need to be set up before map_value_to_index works. Call this before looping over map_value_to_index, then call finalize() when done.

◆ insert_values_into_index()

void turi::ml_data_internal::column_indexer::insert_values_into_index ( const std::vector< flexible_type > &  features)

Some of the ml_data tests currently depend on the order of insertion into the index, which is now done in parallel and thus not deterministic. This function allows the user to remove that randomness by inserting all indices in a specified order.

NOTE: This function is not thread safe; only call it from one thread.

◆ load_version()

void turi::ml_data_internal::column_indexer::load_version ( turi::iarchive iarc,
size_t  version 
)

Load the object.

◆ map_index_to_value()

const flexible_type& turi::ml_data_internal::column_indexer::map_index_to_value ( size_t  idx) const
inline

Returns the feature "value" associated an index.

Note
Only used if is_categorical is true.
Parameters

Definition at line 208 of file column_indexer.hpp.

◆ map_value_to_index()

size_t turi::ml_data_internal::column_indexer::map_value_to_index ( size_t  thread_idx,
const flexible_type feature 
)
inline

Returns the index associated with the "feature" value.

Note
Only used if is_categorical is true.

If the value in the feature column was already seen, then the index already associated with that value is returned. If not, a new unique index is added and associated with this feature value.

This method is completely threadsafe and is meant to be called by multiple threads in contention.

Parameters
[in]featureThe value in the feature column to map to the index.
Returns
An index (possibly new) associated with the given value.

Definition at line 84 of file column_indexer.hpp.

◆ reset_and_return_values()

std::vector<flexible_type> turi::ml_data_internal::column_indexer::reset_and_return_values ( )

Purges and returns all the values; The result is an indexer that contains no values, but metadata like name, mode, and type are preserved.

◆ save_impl()

void turi::ml_data_internal::column_indexer::save_impl ( turi::oarchive oarc) const

Serialize the object (save).

◆ set_indices()

void turi::ml_data_internal::column_indexer::set_indices ( std::vector< flexible_type > &&  values)

Sets the indices and creates all the index maps.


The documentation for this class was generated from the following file: