turicreate.recommender.factorization_recommender.create

turicreate.recommender.factorization_recommender.create(observation_data, user_id='user_id', item_id='item_id', target=None, user_data=None, item_data=None, num_factors=8, regularization=1e-08, linear_regularization=1e-10, side_data_factorization=True, nmf=False, binary_target=False, max_iterations=50, sgd_step_size=0, random_seed=0, solver='auto', verbose=True, **kwargs)

Create a FactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions. This includes both standard matrix factorization as well as factorization machines models (in the situation where side data is available for users and/or items).

Parameters:
observation_data : SFrame

The dataset to use for training the model. It must contain a column of user ids and a column of item ids. Each row represents an observed interaction between the user and the item. The (user, item) pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns are interpreted by the underlying model as side features for the observations.

The user id and item id columns must be of type ‘int’ or ‘str’. The target column must be of type ‘int’ or ‘float’.

user_id : string, optional

The name of the column in observation_data that corresponds to the user id.

item_id : string, optional

The name of the column in observation_data that corresponds to the item id.

target : string

The observation_data must contain a column of scores representing ratings given by the users. If not present, consider using the ranking version of the factorization model, RankingFactorizationRecommender, turicreate.recommender.ranking_factorization_recommender.RankingFactorizationRecommender

user_data : SFrame, optional

Side information for the users. This SFrame must have a column with the same name as what is specified by the user_id input parameter. user_data can provide any amount of additional user-specific information.

item_data : SFrame, optional

Side information for the items. This SFrame must have a column with the same name as what is specified by the item_id input parameter. item_data can provide any amount of additional item-specific information.

num_factors : int, optional

Number of latent factors.

regularization : float, optional

Regularization for interaction terms. The type of regularization is L2. Default: 1e-8; a typical range for this parameter is between 1e-12 and 1.

linear_regularization : float, optional

Regularization for linear term. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1.

side_data_factorization : boolean, optional

Use factorization for modeling any additional features beyond the user and item columns. If True, and side features or any additional columns are present, then a Factorization Machine model is trained. Otherwise, only the linear terms are fit to these features. See turicreate.recommender.factorization_recommender.FactorizationRecommender for more information. Default: True.

nmf : boolean, optional

Use nonnegative matrix factorization, which forces the factors to be nonnegative. Disables linear and intercept terms.

binary_target : boolean, optional

Assume the target column is composed of 0’s and 1’s. If True, use logistic loss to fit the model.

max_iterations : int, optional

The training algorithm will make at most this many iterations through the observed data. Default: 50.

sgd_step_size : float, optional

Step size for stochastic gradient descent. Smaller values generally lead to more accurate models that take more time to train. The default setting of 0 means that the step size is chosen by trying several options on a small subset of the data.

random_seed : int, optional

The random seed used to choose the initial starting point for model training. Note that some randomness in the training is unavoidable, so models trained with the same random seed may still differ slightly. Default: 0.

solver : string, optional

Name of the solver to be used to solve the regression. See the references for more detail on each solver. The available solvers for this model are:

  • auto (default): automatically chooses the best solver for the data
    and model parameters.
  • sgd: Stochastic Gradient Descent.
  • adagrad: Adaptive Gradient Stochastic Gradient Descent [1].
  • als: Alternating Least Squares.
verbose : bool, optional

Enables verbose output.

kwargs : optional

Optional advanced keyword arguments passed in to the model optimization procedure. These parameters do not typically need to be changed.

References

[1] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” The Journal of Machine Learning Research 12 (2011).

Examples

Basic usage

>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
...                       'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"],
...                       'rating': [1, 3, 2, 5, 4, 1, 4, 3]})
>>> m1 = turicreate.factorization_recommender.create(sf, target='rating')

When a target column is present, create() defaults to creating a FactorizationRecommender.

Including side features

>>> user_info = turicreate.SFrame({'user_id': ["0", "1", "2"],
...                              'name': ["Alice", "Bob", "Charlie"],
...                              'numeric_feature': [0.1, 12, 22]})
>>> item_info = turicreate.SFrame({'item_id': ["a", "b", "c", "d"],
...                              'name': ["item1", "item2", "item3", "item4"],
...                              'dict_feature': [{'a' : 23}, {'a' : 13},
...                                               {'b' : 1},
...                                               {'a' : 23, 'b' : 32}]})
>>> m2 = turicreate.factorization_recommender.create(sf, target='rating',
...                                                user_data=user_info,
...                                                item_data=item_info)

Using the Alternating Least Squares (ALS) solver

The factorization model can also be solved using alternating least squares (ALS) as a solver option. This solver does not support side columns or other similar features.

>>> m3 = turicreate.factorization_recommender.create(sf, target='rating',
                                                            solver = 'als')