RankingFactorizationRecommender.evaluate(dataset, metric='auto', exclude_known_for_precision_recall=True, target=None, verbose=True, **kwargs)

Evaluate the model’s ability to make rating predictions or recommendations.

If the model is trained to predict a particular target, the default metric used for model comparison is root-mean-squared error (RMSE). Suppose \(y\) and \(\widehat{y}\) are vectors of length \(N\), where \(y\) contains the actual ratings and \(\widehat{y}\) the predicted ratings. Then the RMSE is defined as

\[RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^N (\widehat{y}_i - y_i)^2} .\]

If the model was not trained on a target column, the default metrics for model comparison are precision and recall. Let \(p_k\) be a vector of the \(k\) highest ranked recommendations for a particular user, and let \(a\) be the set of items for that user in the groundtruth dataset. The “precision at cutoff k” is defined as

\[P(k) = \frac{ | a \cap p_k | }{k}\]

while “recall at cutoff k” is defined as

\[R(k) = \frac{ | a \cap p_k | }{|a|}\]
dataset : SFrame

An SFrame that is in the same format as provided for training.

metric : str, {‘auto’, ‘rmse’, ‘precision_recall’}, optional

Metric to use for evaluation. The default automatically chooses ‘rmse’ for models trained with a target, and ‘precision_recall’ otherwise.

exclude_known_for_precision_recall : bool, optional

A useful option for evaluating precision-recall. Recommender models have the option to exclude items seen in the training data from the final recommendation list. Set this option to True when evaluating on test data, and False when evaluating precision-recall on training data.

target : str, optional

The name of the target column for evaluating rmse. If the model is trained with a target column, the default is to using the same column. If the model is trained without a target column and metric is set to ‘rmse’, this option must provided by user.

verbose : bool, optional

Enables verbose output. Default is verbose.


When metric is set to ‘precision_recall’, these parameters are passed on to evaluate_precision_recall().

out : SFrame or dict

Results from the model evaluation procedure. If the model is trained on a target (i.e. RMSE is the evaluation criterion), a dictionary with three items is returned: items rmse_by_user and rmse_by_item are SFrames with per-user and per-item RMSE, while rmse_overall is the overall RMSE (a float). If the model is trained without a target (i.e. precision and recall are the evaluation criteria) an SFrame is returned with both of these metrics for each user at several cutoff values.

See also

evaluate_precision_recall, evaluate_rmse, precision_recall_by_user


>>> import turicreate as tc
>>> sf = tc.SFrame('https://static.turi.com/datasets/audioscrobbler')
>>> train, test = tc.recommender.util.random_split_by_user(sf)
>>> m = tc.recommender.create(train, target='target')
>>> eval = m.evaluate(test)