turicreate.text_analytics.random_split¶

turicreate.text_analytics.random_split(dataset, prob=0.5)¶

Utility for performing a random split for text data that is already in bag-of-words format. For each (word, count) pair in a particular element, the counts are uniformly partitioned in either a training set or a test set.

Parameters:	dataset : SArray of type dict, SFrame with columns of type dict A data set in bag-of-words format. prob : float, optional Probability for sampling a word to be placed in the test set.
Returns:	train, test : SArray Two data sets in bag-of-words format, where the combined counts are equal to the counts in the original data set.

Examples

>>> docs = turicreate.SArray([{'are':5, 'you':3, 'not': 1, 'entertained':10}])
>>> train, test = turicreate.text_analytics.random_split(docs)
>>> print(train)
[{'not': 1.0, 'you': 3.0, 'are': 3.0, 'entertained': 7.0}]
>>> print(test)
[{'are': 2.0, 'entertained': 3.0}]