random_split_by_session(dataset, session_id, fraction=0.9, seed=None)¶
Randomly split an SFrame into two SFrames based on the session_id such that one split contains all data for a fraction of the sessions while the second split contains all data for the rest of the sessions.
- dataset : SFrame
Dataset to split. It must contain a column of session ids.
- session_id : string, optional
The name of the column in dataset that corresponds to the a unique identifier for each session.
- fraction : float, optional
Fraction of the sessions to fetch for the first returned SFrame. Must be between 0 and 1. Once the sessions are split, all data from a single session is in the same SFrame.
- seed : int, optional
Seed for the random number generator used to split.
# Split the data so that train has 90% of the users. >>> train, valid = tc.activity_classifier.util.random_split_by_session( ... dataset, session_id='session_id', fraction=0.9) # For example: If dataset has 2055 sessions >>> len(dataset['session_id'].unique()) 2055 # The training set now has 90% of the sessions >>> len(train['session_id'].unique()) 1850 # The validation set has the remaining 10% of the sessions >>> len(valid['session_id'].unique()) 205