turicreate.decision_tree_regression.create

turicreate.decision_tree_regression.create(dataset, target, features=None, validation_set='auto', max_depth=6, min_loss_reduction=0.0, min_child_weight=0.1, verbose=True, random_seed=None, metric='auto', **kwargs)

Create a DecisionTreeRegression to predict a scalar target variable using one or more features. In addition to standard numeric and categorical types, features can also be extracted automatically from list- or dictionary-type SFrame columns.

Parameters:
dataset : SFrame

A training dataset containing feature columns and a target column. Only numerical typed (int, float) target column is allowed.

target : str

The name of the column in dataset that is the prediction target. This column must have a numeric type.

features : list[str], optional

A list of columns names of features used for training the model. Defaults to None, using all columns.

validation_set : SFrame, optional

The validation set that is used to watch the validation result as boosting progress.

max_depth : float, optional

Maximum depth of a tree. Must be at least 1.

min_loss_reduction : float, optional (non-negative)

Minimum loss reduction required to make a further partition/split a node during the tree learning phase. Larger (more positive) values can help prevent overfitting by avoiding splits that do not sufficiently reduce the loss function.

min_child_weight : float, optional (non-negative)

Controls the minimum weight of each leaf node. Larger values result in more conservative tree learning and help prevent overfitting. Formally, this is minimum sum of instance weights (hessians) in each node. If the tree learning algorithm results in a leaf node with the sum of instance weights less than min_child_weight, tree building will terminate.

verbose : boolean, optional

If True, print progress information during training.

random_seed: int, optional

Seeds random operations such as column and row subsampling, such that results are reproducible.

metric : str or list[str], optional

Performance metric(s) that are tracked during training. When specified, the progress table will display the tracked metric(s) on training and validation set. Supported metrics are: {‘rmse’, ‘max_error’}

Returns:
out : DecisionTreeRegression

A trained decision tree model

References

Examples

Setup the data:

>>> url = 'https://static.turi.com/datasets/xgboost/mushroom.csv'
>>> data = turicreate.SFrame.read_csv(url)
>>> data['label'] = data['label'] == 'p'

Split the data into training and test data:

>>> train, test = data.random_split(0.8)

Create the model:

>>> model = turicreate.decision_tree_regression.create(train, target='label')

Make predictions and evaluate the model:

>>> predictions = model.predict(test)
>>> results = model.evaluate(test)