Turi Create  4.0
turi::v2::ml_data_iterator_base Class Reference

#include <toolkits/ml_data_2/iterators/ml_data_iterator_base.hpp>

Public Member Functions

virtual ~ml_data_iterator_base ()
 
virtual void reset ()
 Resets the iterator to the start of the sframes in ml_data.
 
virtual bool done () const
 Returns true if the iteration is done, false otherwise.
 
size_t row_index () const
 
size_t unsliced_row_index () const
 Returns the absolute row index.
 
template<typename Entry >
GL_HOT_INLINE void fill_observation (std::vector< Entry > &x) const
 
void fill_untranslated_values (std::vector< flexible_type > &x) const GL_HOT_INLINE_FLATTEN
 
void fill_observation (SparseVector &x) const GL_HOT_INLINE_FLATTEN
 
void fill_observation (DenseVector &x) const GL_HOT_INLINE_FLATTEN
 
template<typename DenseRowXpr >
GL_HOT_INLINE_FLATTEN void fill_eigen_row (DenseRowXpr &&x) const
 
void fill_observation (composite_row_container &crc) GL_HOT_INLINE_FLATTEN
 
double target_value () const GL_HOT_INLINE_FLATTEN
 
size_t target_index () const GL_HOT_INLINE_FLATTEN
 
ml_data_row_reference get_reference () const
 
const ml_data & ml_data_source () const
 
ml_data_internal::entry_value _raw_row_entry (size_t raw_index) const GL_HOT_INLINE_FLATTEN
 

Protected Member Functions

ml_data_internal::entry_value_iterator current_data_iter () const GL_HOT_INLINE_FLATTEN
 
size_t current_block_row_index () const GL_HOT_INLINE_FLATTEN
 
void advance_row () GL_HOT_INLINE_FLATTEN
 
void setup_block_containing_current_row_index () GL_HOT_NOINLINE
 Loads the block containing the row index row_index.
 
void load_next_block () GL_HOT_NOINLINE
 

Protected Attributes

bool add_side_information = false
 
size_t iter_row_index_start = -1
 
size_t iter_row_index_end = -1
 
size_t current_row_index = -1
 
size_t current_block_index = -1
 
size_t current_in_block_index
 
size_t global_row_start
 
size_t max_row_size
 
size_t num_dimensions
 

Detailed Description

Just a simple iterator on the ml_data class. It's just a convenience structure that keeps track of everything relevant for the toolkits.

Definition at line 38 of file ml_data_iterator_base.hpp.

Constructor & Destructor Documentation

◆ ~ml_data_iterator_base()

virtual turi::v2::ml_data_iterator_base::~ml_data_iterator_base ( )
inlinevirtual

Yup, need this.

Definition at line 62 of file ml_data_iterator_base.hpp.

Member Function Documentation

◆ _raw_row_entry()

ml_data_internal::entry_value turi::v2::ml_data_iterator_base::_raw_row_entry ( size_t  raw_index) const
inline

Return the raw value of the internal row storage. Used by some of the internal ml_data processing routines.

Definition at line 346 of file ml_data_iterator_base.hpp.

◆ advance_row()

void turi::v2::ml_data_iterator_base::advance_row ( )
inlineprotected

Advance to the next row.

Definition at line 430 of file ml_data_iterator_base.hpp.

◆ current_block_row_index()

size_t turi::v2::ml_data_iterator_base::current_block_row_index ( ) const
inlineprotected

Return a pointer to the current location in the data.

Definition at line 417 of file ml_data_iterator_base.hpp.

◆ current_data_iter()

ml_data_internal::entry_value_iterator turi::v2::ml_data_iterator_base::current_data_iter ( ) const
inlineprotected

Return a pointer to the current location in the data.

Definition at line 407 of file ml_data_iterator_base.hpp.

◆ fill_eigen_row()

template<typename DenseRowXpr >
GL_HOT_INLINE_FLATTEN void turi::v2::ml_data_iterator_base::fill_eigen_row ( DenseRowXpr &&  x) const
inline

Fill a row of an Eigen Dense Vector, from the current location in the iteration.

Note
The 0th category is used as a reference category.

Example:

Eigen::MatrixXd X;

...

it.fill_eigen_row(X.row(row_idx));


Parameters
[in,out]xAn eigen row expression.

Definition at line 269 of file ml_data_iterator_base.hpp.

◆ fill_observation() [1/4]

template<typename Entry >
GL_HOT_INLINE void turi::v2::ml_data_iterator_base::fill_observation ( std::vector< Entry > &  x) const
inline

Fill an observation vector, represented as an ml_data_entry struct. (column_index, index, value) pairs, from the current location in the iteration. For each column:

Categotical: Returns (col_id, v, 1) Numeric : Returns (col_id, 0, v) Vector : Returns (col_id, i, v) for each (i,v) in vector.

Example use is given by the following code:

std::vector<ml_data_entry> x;

for(ml_data_iterator it(data); !it.is_done(); ++it) { it.fill_observation(x); double y = it.target_value(); ... }

Definition at line 107 of file ml_data_iterator_base.hpp.

◆ fill_observation() [2/4]

void turi::v2::ml_data_iterator_base::fill_observation ( SparseVector &  x) const
inline

Fill an observation vector, represented as an Eigen Sparse Vector, from the current location in the iteration.

Note
A reference category is used in this version of the function.
For performance reasons, this function does not check for new categories during predict time. That must be checked externally.

This function returns a flattened version of the vector provided by the std::pair version of fill_observation.

Example

Warning
This only works when the SFrame is "mapped" to integer keys.

For a dataset with a 3 column SFrame

Row 1: 1.0 0(categorical) <9.1, 2.4> Row 2: 2.0 1(categorical) <1.0, 4.5>

with index = {1,2,2}

the SparseVector format would return

Row 1: < (0, 1.0), (1, 1) ,(3, 9.1) ,(4, 2.4)> Row 2: < (0, 2.0), (2, 1) ,(3, 1.0) ,(4, 4.5)>

Note
The '0'th category is used as reference.
Parameters
[in,out]xData containing everything!

Definition at line 184 of file ml_data_iterator_base.hpp.

◆ fill_observation() [3/4]

void turi::v2::ml_data_iterator_base::fill_observation ( DenseVector &  x) const
inline

Fill an observation vector, represented as an Eigen Dense Vector, from the current location in the iteration.

Note
The 0th category is used as a reference category.
For performance reasons, this function does not check for new categories during predict time. That must be checked externally.

This function returns a flattened version of the vector provided by the std::pair version of fill_observation.

Example

Warning
This only works when the SFrame is "mapped" to intger keys.

For a dataset with a 3 column SFrame

Row 1: 1.0 0(categorical) <9.1, 2.4> Row 2: 2.0 1(categorical) <1.0, 4.5>

with index = {1,2,2}

the DenseVector format would return

Row 1: <1.0, 0, 1, 9.1, 2.4> Row 2: <2.0, 1, 0, 1.0, 4.5>

Parameters
[in,out]xData containing everything!

Definition at line 232 of file ml_data_iterator_base.hpp.

◆ fill_observation() [4/4]

void turi::v2::ml_data_iterator_base::fill_observation ( composite_row_container crc)
inline

Fill a composite row container. The composite row container must have its specification set; this specification is used to then fill the observation.

Definition at line 285 of file ml_data_iterator_base.hpp.

◆ fill_untranslated_values()

void turi::v2::ml_data_iterator_base::fill_untranslated_values ( std::vector< flexible_type > &  x) const
inline

Fill an observation vector with the untranslated columns, if any have been specified at setup time. These columns are simply mapped back to their sarray counterparts.

The metadata surrounding the original column indices are

Definition at line 133 of file ml_data_iterator_base.hpp.

◆ get_reference()

ml_data_row_reference turi::v2::ml_data_iterator_base::get_reference ( ) const
inline

Return a row reference instead of the actual observation. The row reference can be used to fill the observation vectors just like the iterator can, and can easily be passed around by value.

Definition at line 323 of file ml_data_iterator_base.hpp.

◆ load_next_block()

void turi::v2::ml_data_iterator_base::load_next_block ( )
protected

Loads the next block, resetting all the values so iteration will be supported over the next row.

◆ ml_data_source()

const ml_data& turi::v2::ml_data_iterator_base::ml_data_source ( ) const
inline

Return the data this iterator is working with.

Definition at line 339 of file ml_data_iterator_base.hpp.

◆ row_index()

size_t turi::v2::ml_data_iterator_base::row_index ( ) const
inline

Returns the current index of the sframe row, respecting all slicing operations on the original ml_data.

Definition at line 81 of file ml_data_iterator_base.hpp.

◆ target_index()

size_t turi::v2::ml_data_iterator_base::target_index ( ) const
inline

Returns the current categorical target index, if present, or 0 if not present.

Definition at line 309 of file ml_data_iterator_base.hpp.

◆ target_value()

double turi::v2::ml_data_iterator_base::target_value ( ) const
inline

Returns the current target value, if present, or 1 if not present. If the target column is supposed to be a categorical value, then use categorical_target_index().

Definition at line 298 of file ml_data_iterator_base.hpp.

Member Data Documentation

◆ add_side_information

bool turi::v2::ml_data_iterator_base::add_side_information = false
protected

The options used for this iterator.

Definition at line 366 of file ml_data_iterator_base.hpp.

◆ current_block_index

size_t turi::v2::ml_data_iterator_base::current_block_index = -1
protected

Index of the currently loaded block.

Definition at line 375 of file ml_data_iterator_base.hpp.

◆ current_in_block_index

size_t turi::v2::ml_data_iterator_base::current_in_block_index
protected

The current index pointed to inside the block.

Definition at line 379 of file ml_data_iterator_base.hpp.

◆ current_row_index

size_t turi::v2::ml_data_iterator_base::current_row_index = -1
protected

Current row index for this iterator.

Definition at line 374 of file ml_data_iterator_base.hpp.

◆ global_row_start

size_t turi::v2::ml_data_iterator_base::global_row_start
protected

The absolute values of the global row starting locations.

Definition at line 383 of file ml_data_iterator_base.hpp.

◆ iter_row_index_end

size_t turi::v2::ml_data_iterator_base::iter_row_index_end = -1
protected

Ending row index for this iterator.

Definition at line 373 of file ml_data_iterator_base.hpp.

◆ iter_row_index_start

size_t turi::v2::ml_data_iterator_base::iter_row_index_start = -1
protected

Starting row index for this iterator.

Definition at line 372 of file ml_data_iterator_base.hpp.

◆ max_row_size

size_t turi::v2::ml_data_iterator_base::max_row_size
protected

The maximum row size across all rows in the given ml_data object. Each row's size is defined to be the number of unpacked features in that row. For example, this is useful when one needs to preallocate a vector to be the largest size needed for any row that will be given by this iterator.

Definition at line 391 of file ml_data_iterator_base.hpp.

◆ num_dimensions

size_t turi::v2::ml_data_iterator_base::num_dimensions
protected

The total sum of column sizes.

Definition at line 395 of file ml_data_iterator_base.hpp.


The documentation for this class was generated from the following file: