turicreate.text_analytics.random_split

turicreate.text_analytics.random_split(dataset, prob=0.5)

Utility for performing a random split for text data that is already in bag-of-words format. For each (word, count) pair in a particular element, the counts are uniformly partitioned in either a training set or a test set.

Parameters:
dataset : SArray of type dict, SFrame with columns of type dict

A data set in bag-of-words format.

prob : float, optional

Probability for sampling a word to be placed in the test set.

Returns:
train, test : SArray

Two data sets in bag-of-words format, where the combined counts are equal to the counts in the original data set.

Examples

>>> docs = turicreate.SArray([{'are':5, 'you':3, 'not': 1, 'entertained':10}])
>>> train, test = turicreate.text_analytics.random_split(docs)
>>> print(train)
[{'not': 1.0, 'you': 3.0, 'are': 3.0, 'entertained': 7.0}]
>>> print(test)
[{'are': 2.0, 'entertained': 3.0}]