Turi Create  4.0
turi::supervised::standardization_interface Class Referenceabstract

#include <toolkits/supervised_learning/standardization-inl.hpp>

Public Member Functions

virtual ~standardization_interface ()=default
 
 standardization_interface ()=default
 
virtual void transform (DenseVector &point) const =0
 
virtual void inverse_transform (DenseVector &point) const =0
 
virtual void inverse_transform (SparseVector &point) const =0
 
virtual void transform (SparseVector &point) const =0
 
virtual void save (turi::oarchive &oarc) const =0
 
virtual void load (turi::iarchive &iarc)=0
 
size_t get_total_size () const
 

Protected Attributes

size_t total_size
 

Detailed Description

Interface for affine transformation of data for machine learning and optimization purposes.

Background: Feature Scaling

Feature scaling performs standardization of data for supervised learning methods. Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. Therefore, the range of all features should be normalized so that each feature contributes approximately equally.

What we need for a standardization scheme.

The standardization interface makes sure that you can implement various types of data standardization methods without effecting much of the code base.

Each standardization scheme requires the following methods:

*) Construction based on metadata: Given a complete metadata object, we can construct the standardization object.

*) Transform: Perform a transformation from the original space to the standardized space.

*) Inverse-Transform: Perform a transformation from the standardized space to the original space.

Comparison of various methods for standardization

1) Norm-Rescaling: Given a column of data x, the norm re-scaling changes the column to: x' = x / ||x||

where ||x|| can be the L1, L2, or L-Inf norm.

PROS: Sparsity preserving. CONS: May not be the right thing to do for regularized problems.

2) Mean-Stdev: Given a column of data x, the norm re-scaling changes the column to: x' = (x - mean) / stdev

PROS: Statistically well documented. CONS: Sparsity breaking

3) Min-Max: Given a column of data x, the norm re-scaling changes the column to: x' = (x - min(x)) / (max(x) - min(x))

PROS: Well documented for SVM. CONS: Sparsity breaking

Note
The important part is for us to get something that helps with numerical issues and is sparsity preserving. The interface here allows us to try many things and see what works best.

Definition at line 95 of file standardization-inl.hpp.

Constructor & Destructor Documentation

◆ ~standardization_interface()

virtual turi::supervised::standardization_interface::~standardization_interface ( )
virtualdefault

Default destructor.

◆ standardization_interface()

turi::supervised::standardization_interface::standardization_interface ( )
default

Default constructor.

Member Function Documentation

◆ get_total_size()

size_t turi::supervised::standardization_interface::get_total_size ( ) const
inline

Return the total size of all the variables in the space.

Parameters
[out]total_sizeSize of all the variables in the space.
Note
This is the sum of the sizes of the individual features that created this object. They are

Numeric : 1 Categorical : # Unique categories Vector : Size of the vector. CategoricalVector : # Unique categories. Dictionary : # Keys

For reference encoding, subtract 1 from the Categorical and Categorical-Vector types.

Returns
Column size.

Definition at line 191 of file standardization-inl.hpp.

◆ inverse_transform() [1/2]

virtual void turi::supervised::standardization_interface::inverse_transform ( DenseVector &  point) const
pure virtual

Inverse transform a point from the standardized space to the original space.

Parameters
[in,out]point(DenseVector)Point to be transformed.

Implemented in turi::supervised::l2_rescaling.

◆ inverse_transform() [2/2]

virtual void turi::supervised::standardization_interface::inverse_transform ( SparseVector &  point) const
pure virtual

Inverse transform a point from the standardized space to the original space.

Parameters
[in,out]point(SparseVector)Point to be transformed.

Implemented in turi::supervised::l2_rescaling.

◆ load()

virtual void turi::supervised::standardization_interface::load ( turi::iarchive iarc)
pure virtual

Serialization – Load object

Load this class from a Turi iarc object.

Parameters
[in]iarcTuri iarc object

Implemented in turi::supervised::l2_rescaling.

◆ save()

virtual void turi::supervised::standardization_interface::save ( turi::oarchive oarc) const
pure virtual

Serialization – Save object

Save this class to a Turi oarc object.

Parameters
[in]oarcTuri oarc object

Implemented in turi::supervised::l2_rescaling.

◆ transform() [1/2]

virtual void turi::supervised::standardization_interface::transform ( DenseVector &  point) const
pure virtual

Transform a point from the original space to the standardized space.

Parameters
[in,out]point(DenseVector)Point to be transformed.

Implemented in turi::supervised::l2_rescaling.

◆ transform() [2/2]

virtual void turi::supervised::standardization_interface::transform ( SparseVector &  point) const
pure virtual

Transform a point from the original space to the standardized space.

Parameters
[in,out]point(SparseVector)Point to be transformed.

Implemented in turi::supervised::l2_rescaling.

Member Data Documentation

◆ total_size

size_t turi::supervised::standardization_interface::total_size
protected

Total size

Definition at line 99 of file standardization-inl.hpp.


The documentation for this class was generated from the following file: