turicreate.recommender.ranking_factorization_recommender.create

turicreate.recommender.ranking_factorization_recommender.create(observation_data, user_id='user_id', item_id='item_id', target=None, user_data=None, item_data=None, num_factors=32, regularization=1e-09, linear_regularization=1e-09, side_data_factorization=True, ranking_regularization=0.25, unobserved_rating_value=None, num_sampled_negative_examples=4, max_iterations=25, sgd_step_size=0, random_seed=0, binary_target=False, solver='auto', verbose=True, **kwargs)

Create a RankingFactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions.

Parameters:
observation_data : SFrame

The dataset to use for training the model. It must contain a column of user ids and a column of item ids. Each row represents an observed interaction between the user and the item. The (user, item) pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns are interpreted by the underlying model as side features for the observations.

The user id and item id columns must be of type ‘int’ or ‘str’. The target column must be of type ‘int’ or ‘float’.

user_id : string, optional

The name of the column in observation_data that corresponds to the user id.

item_id : string, optional

The name of the column in observation_data that corresponds to the item id.

target : string, optional

The observation_data can optionally contain a column of scores representing ratings given by the users. If present, the name of this column may be specified variables target.

user_data : SFrame, optional

Side information for the users. This SFrame must have a column with the same name as what is specified by the user_id input parameter. user_data can provide any amount of additional user-specific information.

item_data : SFrame, optional

Side information for the items. This SFrame must have a column with the same name as what is specified by the item_id input parameter. item_data can provide any amount of additional item-specific information.

num_factors : int, optional

Number of latent factors.

regularization : float, optional

L2 regularization for interaction terms. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1. Setting this to 0 may cause numerical issues.

linear_regularization : float, optional

L2 regularization for linear term. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1. Setting this to 0 may cause numerical issues.

side_data_factorization : boolean, optional

Use factorization for modeling any additional features beyond the user and item columns. If True, and side features or any additional columns are present, then a Factorization Machine model is trained. Otherwise, only the linear terms are fit to these features. See turicreate.recommender.ranking_factorization_recommender.RankingFactorizationRecommender for more information. Default: True.

ranking_regularization : float, optional

Penalize the predicted value of user-item pairs not in the training set. Larger values increase this penalization. Suggested values: 0, 0.1, 0.5, 1. NOTE: if no target column is present, this parameter is ignored.

unobserved_rating_value : float, optional

Penalize unobserved items with a larger predicted score than this value. By default, the estimated 5% quantile is used (mean - 1.96*std_dev).

num_sampled_negative_examples : integer, optional

For each (user, item) pair in the data, the ranking sgd solver evaluates this many randomly chosen unseen items for the negative example step. Increasing this can give better performance at the expense of speed, particularly when the number of items is large. Default is 4.

binary_target : boolean, optional

Assume the target column is composed of 0’s and 1’s. If True, use logistic loss to fit the model.

max_iterations : int, optional

The training algorithm will make at most this many iterations through the observed data. Default: 50.

sgd_step_size : float, optional

Step size for stochastic gradient descent. Smaller values generally lead to more accurate models that take more time to train. The default setting of 0 means that the step size is chosen by trying several options on a small subset of the data.

random_seed : int, optional

The random seed used to choose the initial starting point for model training. Note that some randomness in the training is unavoidable, so models trained with the same random seed may still differ. Default: 0.

solver : string, optional

Name of the solver to be used to solve the regression. See the references for more detail on each solver. The available solvers for this model are:

  • auto (default): automatically chooses the best solver for the data
    and model parameters.
  • ials: Implicit Alternating Least Squares [1].
  • adagrad: Adaptive Gradient Stochastic Gradient Descent.
  • sgd: Stochastic Gradient Descent
verbose : bool, optional

Enables verbose output.

kwargs : optional

Optional advanced keyword arguments passed in to the model optimization procedure. These parameters do not typically need to be changed.

References

[1] Collaborative Filtering for Implicit Feedback Datasets Hu, Y.; Koren,
Y.; Volinsky, C. IEEE International Conference on Data Mining (ICDM 2008), IEEE (2008).

Examples

Basic usage

When given just user and item pairs, one can create a RankingFactorizationRecommender as follows.

>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
...                       'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"])
>>> from turicreate.recommender import ranking_factorization_recommender
>>> m1 = ranking_factorization_recommender.create(sf)

When a target column is present, one can include this to try and recommend items that are rated highly.

>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
...                       'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"],
...                       'rating': [1, 3, 2, 5, 4, 1, 4, 3]})
>>> m1 = ranking_factorization_recommender.create(sf, target='rating')

Including side features

>>> user_info = turicreate.SFrame({'user_id': ["0", "1", "2"],
...                              'name': ["Alice", "Bob", "Charlie"],
...                              'numeric_feature': [0.1, 12, 22]})
>>> item_info = turicreate.SFrame({'item_id': ["a", "b", "c", "d"],
...                              'name': ["item1", "item2", "item3", "item4"],
...                              'dict_feature': [{'a' : 23}, {'a' : 13},
...                                               {'b' : 1},
...                                               {'a' : 23, 'b' : 32}]})
>>> m2 = ranking_factorization_recommender.create(sf, target='rating',
...                                               user_data=user_info,
...                                               item_data=item_info)

Customizing ranking regularization

Create a model that pushes predicted ratings of unobserved user-item pairs toward 1 or below.

>>> m3 = ranking_factorization_recommender.create(sf, target='rating',
...                                               ranking_regularization = 0.1,
...                                               unobserved_rating_value = 1)

Using the implicit alternating least squares model

Ranking factorization also implements implicit alternating least squares [1] as an alternative solver. This is enable using solver = 'ials'.

>>> m3 = ranking_factorization_recommender.create(sf, target='rating',
                                                  solver = 'ials')