turicreate.activity_classifier.create¶

turicreate.activity_classifier.create(dataset, session_id, target, features=None, prediction_window=100, validation_set='auto', max_iterations=10, batch_size=32, verbose=True, random_seed=None)¶

Create an ActivityClassifier model.

Parameters:

dataset : SFrame

Input data which consists of sessions of data where each session is a sequence of data. The data must be in stacked format, grouped by session. Within each session, the data is assumed to be sorted temporally. Columns in features will be used to train a model that will make a prediction using labels in the target column.

session_id : string

Name of the column that contains a unique ID for each session.

target : string

Name of the column containing the target variable. The values in this column must be of string or integer type. Use model.classes to retrieve the order in which the classes are mapped.

features : list[string], optional

Name of the columns containing the input features that will be used for classification. If set to None, all columns except session_id and target will be used.

prediction_window : int, optional

Number of time units between predictions. For example, if your input data is sampled at 100Hz, and the prediction_window is set to 100, then this model will make a prediction every 1 second.

validation_set : SFrame, optional

A dataset for monitoring the model’s generalization performance to prevent the model from overfitting to the training data.

For each row of the progress table, accuracy is measured over the provided training dataset and the validation_set. The format of this SFrame must be the same as the training set.

When set to ‘auto’, a validation set is automatically sampled from the training data (if the training data has > 100 sessions). If validation_set is set to None, then all the data will be used for training.

max_iterations : int , optional

Maximum number of iterations/epochs made over the data during the training phase.

batch_size : int, optional

Number of sequence chunks used per training step. Must be greater than the number of GPUs in use.

verbose : bool, optional

If True, print progress updates and model details.

random_seed : int, optional

The results can be reproduced when given the same seed.

Returns:

out : ActivityClassifier: A trained ActivityClassifier model.

See also

ActivityClassifier, util.random_split_by_session

Examples

>>> import turicreate as tc

# Training on dummy data
>>> data = tc.SFrame({
...    'accelerometer_x': [0.1, 0.2, 0.3, 0.4, 0.5] * 10,
...    'accelerometer_y': [0.5, 0.4, 0.3, 0.2, 0.1] * 10,
...    'accelerometer_z': [0.01, 0.01, 0.02, 0.02, 0.01] * 10,
...    'session_id': [0, 0, 0] * 10 + [1, 1] * 10,
...    'activity': ['walk', 'run', 'run'] * 10 + ['swim', 'swim'] * 10
... })

# Create an activity classifier
>>> model = tc.activity_classifier.create(data,
...     session_id='session_id', target='activity',
...     features=['accelerometer_x', 'accelerometer_y', 'accelerometer_z'])

# Make predictions (as probability vector, or class)
>>> predictions = model.predict(data)
>>> predictions = model.predict(data, output_type='probability_vector')

# Get both predictions and classes together
>>> predictions = model.classify(data)

# Get topk predictions (instead of only top-1) if your labels have more
# 2 classes
>>> predictions = model.predict_topk(data, k = 3)

# Evaluate the model
>>> results = model.evaluate(data)