turicreate.recommender.util.compare_models¶
-
turicreate.recommender.util.
compare_models
(dataset, models, model_names=None, user_sample=1.0, metric='auto', target=None, exclude_known_for_precision_recall=True, make_plot=False, verbose=True, **kwargs)¶ Compare the prediction or recommendation performance of recommender models on a common test dataset.
Models that are trained to predict ratings are compared separately from models that are trained without target ratings. The ratings prediction models are compared on root-mean-squared error, and the rest are compared on precision-recall.
Parameters: - dataset : SFrame
The dataset to use for model evaluation.
- models : list[recommender models]
List of trained recommender models.
- model_names : list[str], optional
List of model name strings for display.
- user_sample : float, optional
Sampling proportion of unique users to use in estimating model performance. Defaults to 1.0, i.e. use all users in the dataset.
- metric : str, {‘auto’, ‘rmse’, ‘precision_recall’}, optional
Metric for the evaluation. The default automatically splits models into two groups with their default evaluation metric respectively: ‘rmse’ for models trained with a target, and ‘precision_recall’ otherwise.
- target : str, optional
The name of the target column for evaluating rmse. If the model is trained with a target column, the default is to using the same column. If the model is trained without a target column and metric=’rmse’, then this option must be provided by user.
- exclude_known_for_precision_recall : bool, optional
A useful option when metric=’precision_recall’. Recommender models automatically exclude items seen in the training data from the final recommendation list. If the input evaluation dataset is the same as the data used for training the models, set this option to False.
- verbose : bool, optional
If true, print the progress.
Returns: - out : list[SFrame]
A list of results where each one is an sframe of evaluation results of the respective model on the given dataset
Examples
If you have created two ItemSimilarityRecommenders
m1
andm2
and have anSFrame
test_data
, then you may compare the performance of the two models on test data using:>>> import turicreate >>> train_data = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], ... 'item_id': ["a", "c", "e", "b", "f", "b", "c", "d"]}) >>> test_data = turicreate.SFrame({'user_id': ["0", "0", "1", "1", "1", "2", "2"], ... 'item_id': ["b", "d", "a", "c", "e", "a", "e"]}) >>> m1 = turicreate.item_similarity_recommender.create(train_data) >>> m2 = turicreate.item_similarity_recommender.create(train_data, only_top_k=1) >>> turicreate.recommender.util.compare_models(test_data, [m1, m2], model_names=["m1", "m2"])
The evaluation metric is automatically set to ‘precision_recall’, and the evaluation will be based on recommendations that exclude items seen in the training data.
If you want to evaluate on the original training set:
>>> turicreate.recommender.util.compare_models(train_data, [m1, m2], ... exclude_known_for_precision_recall=False)
Suppose you have four models, two trained with a target rating column, and the other two trained without a target. By default, the models are put into two different groups with “rmse”, and “precision-recall” as the evaluation metric respectively.
>>> train_data2 = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], ... 'item_id': ["a", "c", "e", "b", "f", "b", "c", "d"], ... 'rating': [1, 3, 4, 5, 3, 4, 2, 5]}) >>> test_data2 = turicreate.SFrame({'user_id': ["0", "0", "1", "1", "1", "2", "2"], ... 'item_id': ["b", "d", "a", "c", "e", "a", "e"], ... 'rating': [3, 5, 4, 4, 3, 5, 2]}) >>> m3 = turicreate.factorization_recommender.create(train_data2, target='rating') >>> m4 = turicreate.factorization_recommender.create(train_data2, target='rating') >>> turicreate.recommender.util.compare_models(test_data2, [m3, m4])
To compare all four models using the same ‘precision_recall’ metric, you can do:
>>> turicreate.recommender.util.compare_models(test_data2, [m1, m2, m3, m4], ... metric='precision_recall')