turicreate.boosted_trees_classifier.BoostedTreesClassifier.extract_features

BoostedTreesClassifier.extract_features(dataset, missing_value_action='auto')

For each example in the dataset, extract the leaf indices of each tree as features.

For multiclass classification, each leaf index contains #num_class numbers.

The returned feature vectors can be used as input to train another supervised learning model such as a LogisticClassifier, or a SVMClassifier.

Parameters:
dataset : SFrame

Dataset of new observations. Must include columns with the same names as the features used for model training, but does not require a target column. Additional columns are ignored.

missing_value_action: str, optional

Action to perform when missing values are encountered. This can be one of:

  • ‘auto’: Choose a model dependent missing value policy.
  • ‘impute’: Proceed with evaluation by filling in the missing
    values with the mean of the training data. Missing values are also imputed if an entire column of data is missing during evaluation.
  • ‘none’: Treat missing value as is. Model must be able to handle
    missing value.
  • ‘error’ : Do not proceed with prediction and terminate with
    an error message.
Returns:
out : SArray

An SArray of dtype array.array containing extracted features.

Examples

>>> data =  turicreate.SFrame(
    'https://static.turi.com/datasets/regression/houses.csv')
>>> # Regression Tree Models
>>> data['regression_tree_features'] = model.extract_features(data)
>>> # Classification Tree Models
>>> data['classification_tree_features'] = model.extract_features(data)