turicreate.recommender.ranking_factorization_recommender.create¶
-
turicreate.recommender.ranking_factorization_recommender.
create
(observation_data, user_id='user_id', item_id='item_id', target=None, user_data=None, item_data=None, num_factors=32, regularization=1e-09, linear_regularization=1e-09, side_data_factorization=True, ranking_regularization=0.25, unobserved_rating_value=None, num_sampled_negative_examples=4, max_iterations=25, sgd_step_size=0, random_seed=0, binary_target=False, solver='auto', verbose=True, **kwargs)¶ Create a RankingFactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions.
Parameters: - observation_data : SFrame
The dataset to use for training the model. It must contain a column of user ids and a column of item ids. Each row represents an observed interaction between the user and the item. The (user, item) pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns are interpreted by the underlying model as side features for the observations.
The user id and item id columns must be of type ‘int’ or ‘str’. The target column must be of type ‘int’ or ‘float’.
- user_id : string, optional
The name of the column in observation_data that corresponds to the user id.
- item_id : string, optional
The name of the column in observation_data that corresponds to the item id.
- target : string, optional
The observation_data can optionally contain a column of scores representing ratings given by the users. If present, the name of this column may be specified variables target.
- user_data : SFrame, optional
Side information for the users. This SFrame must have a column with the same name as what is specified by the user_id input parameter. user_data can provide any amount of additional user-specific information.
- item_data : SFrame, optional
Side information for the items. This SFrame must have a column with the same name as what is specified by the item_id input parameter. item_data can provide any amount of additional item-specific information.
- num_factors : int, optional
Number of latent factors.
- regularization : float, optional
L2 regularization for interaction terms. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1. Setting this to 0 may cause numerical issues.
- linear_regularization : float, optional
L2 regularization for linear term. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1. Setting this to 0 may cause numerical issues.
- side_data_factorization : boolean, optional
Use factorization for modeling any additional features beyond the user and item columns. If True, and side features or any additional columns are present, then a Factorization Machine model is trained. Otherwise, only the linear terms are fit to these features. See
turicreate.recommender.ranking_factorization_recommender.RankingFactorizationRecommender
for more information. Default: True.- ranking_regularization : float, optional
Penalize the predicted value of user-item pairs not in the training set. Larger values increase this penalization. Suggested values: 0, 0.1, 0.5, 1. NOTE: if no target column is present, this parameter is ignored.
- unobserved_rating_value : float, optional
Penalize unobserved items with a larger predicted score than this value. By default, the estimated 5% quantile is used (mean - 1.96*std_dev).
- num_sampled_negative_examples : integer, optional
For each (user, item) pair in the data, the ranking sgd solver evaluates this many randomly chosen unseen items for the negative example step. Increasing this can give better performance at the expense of speed, particularly when the number of items is large. Default is 4.
- binary_target : boolean, optional
Assume the target column is composed of 0’s and 1’s. If True, use logistic loss to fit the model.
- max_iterations : int, optional
The training algorithm will make at most this many iterations through the observed data. Default: 50.
- sgd_step_size : float, optional
Step size for stochastic gradient descent. Smaller values generally lead to more accurate models that take more time to train. The default setting of 0 means that the step size is chosen by trying several options on a small subset of the data.
- random_seed : int, optional
The random seed used to choose the initial starting point for model training. Note that some randomness in the training is unavoidable, so models trained with the same random seed may still differ. Default: 0.
- solver : string, optional
Name of the solver to be used to solve the regression. See the references for more detail on each solver. The available solvers for this model are:
- auto (default): automatically chooses the best solver for the data
- and model parameters.
- ials: Implicit Alternating Least Squares [1].
- adagrad: Adaptive Gradient Stochastic Gradient Descent.
- sgd: Stochastic Gradient Descent
- verbose : bool, optional
Enables verbose output.
- kwargs : optional
Optional advanced keyword arguments passed in to the model optimization procedure. These parameters do not typically need to be changed.
See also
References
- [1] Collaborative Filtering for Implicit Feedback Datasets Hu, Y.; Koren,
- Y.; Volinsky, C. IEEE International Conference on Data Mining (ICDM 2008), IEEE (2008).
Examples
Basic usage
When given just user and item pairs, one can create a RankingFactorizationRecommender as follows.
>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], ... 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"]) >>> from turicreate.recommender import ranking_factorization_recommender >>> m1 = ranking_factorization_recommender.create(sf)
When a target column is present, one can include this to try and recommend items that are rated highly.
>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], ... 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], ... 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})
>>> m1 = ranking_factorization_recommender.create(sf, target='rating')
Including side features
>>> user_info = turicreate.SFrame({'user_id': ["0", "1", "2"], ... 'name': ["Alice", "Bob", "Charlie"], ... 'numeric_feature': [0.1, 12, 22]}) >>> item_info = turicreate.SFrame({'item_id': ["a", "b", "c", "d"], ... 'name': ["item1", "item2", "item3", "item4"], ... 'dict_feature': [{'a' : 23}, {'a' : 13}, ... {'b' : 1}, ... {'a' : 23, 'b' : 32}]}) >>> m2 = ranking_factorization_recommender.create(sf, target='rating', ... user_data=user_info, ... item_data=item_info)
Customizing ranking regularization
Create a model that pushes predicted ratings of unobserved user-item pairs toward 1 or below.
>>> m3 = ranking_factorization_recommender.create(sf, target='rating', ... ranking_regularization = 0.1, ... unobserved_rating_value = 1)
Using the implicit alternating least squares model
Ranking factorization also implements implicit alternating least squares [1] as an alternative solver. This is enable using
solver = 'ials'
.>>> m3 = ranking_factorization_recommender.create(sf, target='rating', solver = 'ials')