Turi Create  4.0
turi::v2::ml_data_block_iterator Class Referencefinal

#include <toolkits/ml_data_2/iterators/ml_data_block_iterator.hpp>

Public Member Functions

bool is_start_of_new_block () const
 
const ml_data_block_iteratoroperator++ () GL_HOT_INLINE_FLATTEN
 
void reset ()
 
bool done () const GL_HOT_INLINE_FLATTEN
 
size_t row_index () const
 
size_t unsliced_row_index () const
 Returns the absolute row index.
 
template<typename Entry >
GL_HOT_INLINE void fill_observation (std::vector< Entry > &x) const
 
void fill_observation (SparseVector &x) const GL_HOT_INLINE_FLATTEN
 
void fill_observation (DenseVector &x) const GL_HOT_INLINE_FLATTEN
 
void fill_observation (composite_row_container &crc) GL_HOT_INLINE_FLATTEN
 
void fill_untranslated_values (std::vector< flexible_type > &x) const GL_HOT_INLINE_FLATTEN
 
template<typename DenseRowXpr >
GL_HOT_INLINE_FLATTEN void fill_eigen_row (DenseRowXpr &&x) const
 
double target_value () const GL_HOT_INLINE_FLATTEN
 
size_t target_index () const GL_HOT_INLINE_FLATTEN
 
ml_data_row_reference get_reference () const
 
const ml_data & ml_data_source () const
 
ml_data_internal::entry_value _raw_row_entry (size_t raw_index) const GL_HOT_INLINE_FLATTEN
 

Protected Member Functions

ml_data_internal::entry_value_iterator current_data_iter () const GL_HOT_INLINE_FLATTEN
 
size_t current_block_row_index () const GL_HOT_INLINE_FLATTEN
 
void advance_row () GL_HOT_INLINE_FLATTEN
 
void setup_block_containing_current_row_index () GL_HOT_NOINLINE
 Loads the block containing the row index row_index.
 
void load_next_block () GL_HOT_NOINLINE
 

Protected Attributes

bool add_side_information = false
 
size_t iter_row_index_start = -1
 
size_t iter_row_index_end = -1
 
size_t current_row_index = -1
 
size_t current_block_index = -1
 
size_t current_in_block_index
 
size_t global_row_start
 
size_t max_row_size
 
size_t num_dimensions
 

Detailed Description

This iterator acts similarly to the regular ml_data_iterator class; however, it also enables the user to implement simple iteration over blocks of rows. Here, a block is defined by a collection of rows in which the first value is common.

The ml_data_block_iterator does this by providing two additional functionalities beyond ml_data_iterator:

  1. is_start_of_new_block() returns true only if the first value in the current row differs from the first value in the previous row. (It is also true at the starting bound of iteration). Thus the user knows when to switch to a new block.
  2. If the iteration range is broken up by threads, i.e. num_threads > 1, then the effective bounds of the partitions of each individual iterator will always be on the boundaries between blocks. Thus parallel iteration will never split a block between two threads.

In all other respects, this iterator behaves just like ml_data_iterator.

Definition at line 36 of file ml_data_block_iterator.hpp.

Member Function Documentation

◆ _raw_row_entry()

ml_data_internal::entry_value turi::v2::ml_data_iterator_base::_raw_row_entry ( size_t  raw_index) const
inlineinherited

Return the raw value of the internal row storage. Used by some of the internal ml_data processing routines.

Definition at line 346 of file ml_data_iterator_base.hpp.

◆ advance_row()

void turi::v2::ml_data_iterator_base::advance_row ( )
inlineprotectedinherited

Advance to the next row.

Definition at line 430 of file ml_data_iterator_base.hpp.

◆ current_block_row_index()

size_t turi::v2::ml_data_iterator_base::current_block_row_index ( ) const
inlineprotectedinherited

Return a pointer to the current location in the data.

Definition at line 417 of file ml_data_iterator_base.hpp.

◆ current_data_iter()

ml_data_internal::entry_value_iterator turi::v2::ml_data_iterator_base::current_data_iter ( ) const
inlineprotectedinherited

Return a pointer to the current location in the data.

Definition at line 407 of file ml_data_iterator_base.hpp.

◆ done()

bool turi::v2::ml_data_block_iterator::done ( ) const
inlinevirtual

Returns true if we are done with the iteration range of the current iterator and false otherwise.

Reimplemented from turi::v2::ml_data_iterator_base.

Definition at line 85 of file ml_data_block_iterator.hpp.

◆ fill_eigen_row()

template<typename DenseRowXpr >
GL_HOT_INLINE_FLATTEN void turi::v2::ml_data_iterator_base::fill_eigen_row ( DenseRowXpr &&  x) const
inlineinherited

Fill a row of an Eigen Dense Vector, from the current location in the iteration.

Note
The 0th category is used as a reference category.

Example:

Eigen::MatrixXd X;

...

it.fill_eigen_row(X.row(row_idx));


Parameters
[in,out]xAn eigen row expression.

Definition at line 269 of file ml_data_iterator_base.hpp.

◆ fill_observation() [1/4]

template<typename Entry >
GL_HOT_INLINE void turi::v2::ml_data_iterator_base::fill_observation ( std::vector< Entry > &  x) const
inlineinherited

Fill an observation vector, represented as an ml_data_entry struct. (column_index, index, value) pairs, from the current location in the iteration. For each column:

Categotical: Returns (col_id, v, 1) Numeric : Returns (col_id, 0, v) Vector : Returns (col_id, i, v) for each (i,v) in vector.

Example use is given by the following code:

std::vector<ml_data_entry> x;

for(ml_data_iterator it(data); !it.is_done(); ++it) { it.fill_observation(x); double y = it.target_value(); ... }

Definition at line 107 of file ml_data_iterator_base.hpp.

◆ fill_observation() [2/4]

void turi::v2::ml_data_iterator_base::fill_observation ( SparseVector &  x) const
inlineinherited

Fill an observation vector, represented as an Eigen Sparse Vector, from the current location in the iteration.

Note
A reference category is used in this version of the function.
For performance reasons, this function does not check for new categories during predict time. That must be checked externally.

This function returns a flattened version of the vector provided by the std::pair version of fill_observation.

Example

Warning
This only works when the SFrame is "mapped" to integer keys.

For a dataset with a 3 column SFrame

Row 1: 1.0 0(categorical) <9.1, 2.4> Row 2: 2.0 1(categorical) <1.0, 4.5>

with index = {1,2,2}

the SparseVector format would return

Row 1: < (0, 1.0), (1, 1) ,(3, 9.1) ,(4, 2.4)> Row 2: < (0, 2.0), (2, 1) ,(3, 1.0) ,(4, 4.5)>

Note
The '0'th category is used as reference.
Parameters
[in,out]xData containing everything!

Definition at line 184 of file ml_data_iterator_base.hpp.

◆ fill_observation() [3/4]

void turi::v2::ml_data_iterator_base::fill_observation ( DenseVector &  x) const
inlineinherited

Fill an observation vector, represented as an Eigen Dense Vector, from the current location in the iteration.

Note
The 0th category is used as a reference category.
For performance reasons, this function does not check for new categories during predict time. That must be checked externally.

This function returns a flattened version of the vector provided by the std::pair version of fill_observation.

Example

Warning
This only works when the SFrame is "mapped" to intger keys.

For a dataset with a 3 column SFrame

Row 1: 1.0 0(categorical) <9.1, 2.4> Row 2: 2.0 1(categorical) <1.0, 4.5>

with index = {1,2,2}

the DenseVector format would return

Row 1: <1.0, 0, 1, 9.1, 2.4> Row 2: <2.0, 1, 0, 1.0, 4.5>

Parameters
[in,out]xData containing everything!

Definition at line 232 of file ml_data_iterator_base.hpp.

◆ fill_observation() [4/4]

void turi::v2::ml_data_iterator_base::fill_observation ( composite_row_container crc)
inlineinherited

Fill a composite row container. The composite row container must have its specification set; this specification is used to then fill the observation.

Definition at line 285 of file ml_data_iterator_base.hpp.

◆ fill_untranslated_values()

void turi::v2::ml_data_iterator_base::fill_untranslated_values ( std::vector< flexible_type > &  x) const
inlineinherited

Fill an observation vector with the untranslated columns, if any have been specified at setup time. These columns are simply mapped back to their sarray counterparts.

The metadata surrounding the original column indices are

Definition at line 133 of file ml_data_iterator_base.hpp.

◆ get_reference()

ml_data_row_reference turi::v2::ml_data_iterator_base::get_reference ( ) const
inlineinherited

Return a row reference instead of the actual observation. The row reference can be used to fill the observation vectors just like the iterator can, and can easily be passed around by value.

Definition at line 323 of file ml_data_iterator_base.hpp.

◆ is_start_of_new_block()

bool turi::v2::ml_data_block_iterator::is_start_of_new_block ( ) const
inline

This function returns true if the current observation is the start of a new block.

Definition at line 44 of file ml_data_block_iterator.hpp.

◆ load_next_block()

void turi::v2::ml_data_iterator_base::load_next_block ( )
protectedinherited

Loads the next block, resetting all the values so iteration will be supported over the next row.

◆ ml_data_source()

const ml_data& turi::v2::ml_data_iterator_base::ml_data_source ( ) const
inlineinherited

Return the data this iterator is working with.

Definition at line 339 of file ml_data_iterator_base.hpp.

◆ operator++()

const ml_data_block_iterator& turi::v2::ml_data_block_iterator::operator++ ( )
inline

Advance the iterator to the next row.

Definition at line 50 of file ml_data_block_iterator.hpp.

◆ reset()

void turi::v2::ml_data_block_iterator::reset ( )
virtual

Resets the iterator to the start of the sframes in ml_data.

Reimplemented from turi::v2::ml_data_iterator_base.

◆ row_index()

size_t turi::v2::ml_data_iterator_base::row_index ( ) const
inlineinherited

Returns the current index of the sframe row, respecting all slicing operations on the original ml_data.

Definition at line 81 of file ml_data_iterator_base.hpp.

◆ target_index()

size_t turi::v2::ml_data_iterator_base::target_index ( ) const
inlineinherited

Returns the current categorical target index, if present, or 0 if not present.

Definition at line 309 of file ml_data_iterator_base.hpp.

◆ target_value()

double turi::v2::ml_data_iterator_base::target_value ( ) const
inlineinherited

Returns the current target value, if present, or 1 if not present. If the target column is supposed to be a categorical value, then use categorical_target_index().

Definition at line 298 of file ml_data_iterator_base.hpp.

Member Data Documentation

◆ add_side_information

bool turi::v2::ml_data_iterator_base::add_side_information = false
protectedinherited

The options used for this iterator.

Definition at line 366 of file ml_data_iterator_base.hpp.

◆ current_block_index

size_t turi::v2::ml_data_iterator_base::current_block_index = -1
protectedinherited

Index of the currently loaded block.

Definition at line 375 of file ml_data_iterator_base.hpp.

◆ current_in_block_index

size_t turi::v2::ml_data_iterator_base::current_in_block_index
protectedinherited

The current index pointed to inside the block.

Definition at line 379 of file ml_data_iterator_base.hpp.

◆ current_row_index

size_t turi::v2::ml_data_iterator_base::current_row_index = -1
protectedinherited

Current row index for this iterator.

Definition at line 374 of file ml_data_iterator_base.hpp.

◆ global_row_start

size_t turi::v2::ml_data_iterator_base::global_row_start
protectedinherited

The absolute values of the global row starting locations.

Definition at line 383 of file ml_data_iterator_base.hpp.

◆ iter_row_index_end

size_t turi::v2::ml_data_iterator_base::iter_row_index_end = -1
protectedinherited

Ending row index for this iterator.

Definition at line 373 of file ml_data_iterator_base.hpp.

◆ iter_row_index_start

size_t turi::v2::ml_data_iterator_base::iter_row_index_start = -1
protectedinherited

Starting row index for this iterator.

Definition at line 372 of file ml_data_iterator_base.hpp.

◆ max_row_size

size_t turi::v2::ml_data_iterator_base::max_row_size
protectedinherited

The maximum row size across all rows in the given ml_data object. Each row's size is defined to be the number of unpacked features in that row. For example, this is useful when one needs to preallocate a vector to be the largest size needed for any row that will be given by this iterator.

Definition at line 391 of file ml_data_iterator_base.hpp.

◆ num_dimensions

size_t turi::v2::ml_data_iterator_base::num_dimensions
protectedinherited

The total sum of column sizes.

Definition at line 395 of file ml_data_iterator_base.hpp.


The documentation for this class was generated from the following file: