turicreate.recommender.factorization_recommender.create¶
-
turicreate.recommender.factorization_recommender.
create
(observation_data, user_id='user_id', item_id='item_id', target=None, user_data=None, item_data=None, num_factors=8, regularization=1e-08, linear_regularization=1e-10, side_data_factorization=True, nmf=False, binary_target=False, max_iterations=50, sgd_step_size=0, random_seed=0, solver='auto', verbose=True, **kwargs)¶ Create a FactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions. This includes both standard matrix factorization as well as factorization machines models (in the situation where side data is available for users and/or items).
Parameters: - observation_data : SFrame
The dataset to use for training the model. It must contain a column of user ids and a column of item ids. Each row represents an observed interaction between the user and the item. The (user, item) pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns are interpreted by the underlying model as side features for the observations.
The user id and item id columns must be of type ‘int’ or ‘str’. The target column must be of type ‘int’ or ‘float’.
- user_id : string, optional
The name of the column in observation_data that corresponds to the user id.
- item_id : string, optional
The name of the column in observation_data that corresponds to the item id.
- target : string
The observation_data must contain a column of scores representing ratings given by the users. If not present, consider using the ranking version of the factorization model, RankingFactorizationRecommender,
turicreate.recommender.ranking_factorization_recommender.RankingFactorizationRecommender
- user_data : SFrame, optional
Side information for the users. This SFrame must have a column with the same name as what is specified by the user_id input parameter. user_data can provide any amount of additional user-specific information.
- item_data : SFrame, optional
Side information for the items. This SFrame must have a column with the same name as what is specified by the item_id input parameter. item_data can provide any amount of additional item-specific information.
- num_factors : int, optional
Number of latent factors.
- regularization : float, optional
Regularization for interaction terms. The type of regularization is L2. Default: 1e-8; a typical range for this parameter is between 1e-12 and 1.
- linear_regularization : float, optional
Regularization for linear term. Default: 1e-10; a typical range for this parameter is between 1e-12 and 1.
- side_data_factorization : boolean, optional
Use factorization for modeling any additional features beyond the user and item columns. If True, and side features or any additional columns are present, then a Factorization Machine model is trained. Otherwise, only the linear terms are fit to these features. See
turicreate.recommender.factorization_recommender.FactorizationRecommender
for more information. Default: True.- nmf : boolean, optional
Use nonnegative matrix factorization, which forces the factors to be nonnegative. Disables linear and intercept terms.
- binary_target : boolean, optional
Assume the target column is composed of 0’s and 1’s. If True, use logistic loss to fit the model.
- max_iterations : int, optional
The training algorithm will make at most this many iterations through the observed data. Default: 50.
- sgd_step_size : float, optional
Step size for stochastic gradient descent. Smaller values generally lead to more accurate models that take more time to train. The default setting of 0 means that the step size is chosen by trying several options on a small subset of the data.
- random_seed : int, optional
The random seed used to choose the initial starting point for model training. Note that some randomness in the training is unavoidable, so models trained with the same random seed may still differ slightly. Default: 0.
- solver : string, optional
Name of the solver to be used to solve the regression. See the references for more detail on each solver. The available solvers for this model are:
- auto (default): automatically chooses the best solver for the data
- and model parameters.
- sgd: Stochastic Gradient Descent.
- adagrad: Adaptive Gradient Stochastic Gradient Descent [1].
- als: Alternating Least Squares.
- verbose : bool, optional
Enables verbose output.
- kwargs : optional
Optional advanced keyword arguments passed in to the model optimization procedure. These parameters do not typically need to be changed.
See also
RankingFactorizationRecommender
turicreate.recommender.ranking_factorization_recommender.RankingFactorizationRecommender
References
[1] Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” The Journal of Machine Learning Research 12 (2011).
Examples
Basic usage
>>> sf = turicreate.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], ... 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], ... 'rating': [1, 3, 2, 5, 4, 1, 4, 3]}) >>> m1 = turicreate.factorization_recommender.create(sf, target='rating')
When a target column is present,
create()
defaults to creating aFactorizationRecommender
.Including side features
>>> user_info = turicreate.SFrame({'user_id': ["0", "1", "2"], ... 'name': ["Alice", "Bob", "Charlie"], ... 'numeric_feature': [0.1, 12, 22]}) >>> item_info = turicreate.SFrame({'item_id': ["a", "b", "c", "d"], ... 'name': ["item1", "item2", "item3", "item4"], ... 'dict_feature': [{'a' : 23}, {'a' : 13}, ... {'b' : 1}, ... {'a' : 23, 'b' : 32}]}) >>> m2 = turicreate.factorization_recommender.create(sf, target='rating', ... user_data=user_info, ... item_data=item_info)
Using the Alternating Least Squares (ALS) solver
The factorization model can also be solved using alternating least squares (ALS) as a solver option. This solver does not support side columns or other similar features.
>>> m3 = turicreate.factorization_recommender.create(sf, target='rating', solver = 'als')