turicreate.activity_classifier.util.random_split_by_session

turicreate.activity_classifier.util.random_split_by_session(dataset, session_id, fraction=0.9, seed=None)

Randomly split an SFrame into two SFrames based on the session_id such that one split contains all data for a fraction of the sessions while the second split contains all data for the rest of the sessions.

Parameters:
dataset : SFrame

Dataset to split. It must contain a column of session ids.

session_id : string, optional

The name of the column in dataset that corresponds to the a unique identifier for each session.

fraction : float, optional

Fraction of the sessions to fetch for the first returned SFrame. Must be between 0 and 1. Once the sessions are split, all data from a single session is in the same SFrame.

seed : int, optional

Seed for the random number generator used to split.

Examples

# Split the data so that train has 90% of the users.
>>> train, valid = tc.activity_classifier.util.random_split_by_session(
...     dataset, session_id='session_id', fraction=0.9)

# For example: If dataset has 2055 sessions
>>> len(dataset['session_id'].unique())
2055

# The training set now has 90% of the sessions
>>> len(train['session_id'].unique())
1850

# The validation set has the remaining 10% of the sessions
>>> len(valid['session_id'].unique())
205