turicreate.text_analytics.random_split¶
-
turicreate.text_analytics.
random_split
(dataset, prob=0.5)¶ Utility for performing a random split for text data that is already in bag-of-words format. For each (word, count) pair in a particular element, the counts are uniformly partitioned in either a training set or a test set.
Parameters: - dataset : SArray of type dict, SFrame with columns of type dict
A data set in bag-of-words format.
- prob : float, optional
Probability for sampling a word to be placed in the test set.
Returns: - train, test : SArray
Two data sets in bag-of-words format, where the combined counts are equal to the counts in the original data set.
Examples
>>> docs = turicreate.SArray([{'are':5, 'you':3, 'not': 1, 'entertained':10}]) >>> train, test = turicreate.text_analytics.random_split(docs) >>> print(train) [{'not': 1.0, 'you': 3.0, 'are': 3.0, 'entertained': 7.0}] >>> print(test) [{'are': 2.0, 'entertained': 3.0}]