turicreate.SFrame.random_split

SFrame.random_split(fraction, seed=None, exact=False)

Randomly split the rows of an SFrame into two SFrames. The first SFrame contains M rows, sampled uniformly (without replacement) from the original SFrame. M is approximately the fraction times the original number of rows. The second SFrame contains the remaining rows of the original SFrame.

An exact fraction partition can be optionally obtained by setting exact=True.

Parameters:
fraction : float

Fraction of the rows to fetch. Must be between 0 and 1. if exact is False (default), the number of rows returned is approximately the fraction times the number of rows.

seed : int, optional

Seed for the random number generator used to split.

exact: bool, optional

Defaults to False. If exact=True, an exact fraction is returned, but at a performance penalty.

Returns:
out : tuple [SFrame]

Two new SFrames.

Examples

Suppose we have an SFrame with 1,024 rows and we want to randomly split it into training and testing datasets with about a 90%/10% split.

>>> sf = turicreate.SFrame({'id': range(1024)})
>>> sf_train, sf_test = sf.random_split(.9, seed=5)
>>> print(len(sf_train), len(sf_test))
922 102