turicreate.recommender.factorization_recommender.FactorizationRecommender

class turicreate.recommender.factorization_recommender.FactorizationRecommender(model_proxy)

A FactorizationRecommender learns latent factors for each user and item and uses them to make rating predictions.

FactorizationRecommender [Koren_et_al] contains a number of options that tailor to a variety of datasets and evaluation metrics, making this one of the most powerful model in the Turi Create recommender toolkit.

Side information

Side features may be provided via the user_data and item_data options when the model is created.

Additionally, observation-specific information, such as the time of day when the user rated the item, can also be included. Any column in the observation_data SFrame that is not the user id, item id, or target is treated as a observation side features. The same side feature columns must be present when calling predict().

Side features may be numeric or categorical. User ids and item ids are treated as categorical variables. For the additional side features, the type of the SFrame column determines how it’s handled: strings are treated as categorical variables and integers and floats are treated as numeric variables. Dictionaries and numeric arrays are also supported.

Creating a FactorizationRecommender

This model cannot be constructed directly. Instead, use turicreate.recommender.factorization_recommender.create() to create an instance of this model. A detailed list of parameter options and code samples are available in the documentation for the create function.

Model parameters

Trained model parameters may be accessed using m.get(‘coefficients’) or equivalently m[‘coefficients’].

Notes

Model Definition

FactorizationRecommender trains a model capable of predicting a score for each possible combination of users and items. The internal coefficients of the model are learned from known scores of users and items. Recommendations are then based on these scores.

In the two factorization models, users and items are represented by weights and factors. These model coefficients are learned during training. Roughly speaking, the weights, or bias terms, account for a user or item’s bias towards higher or lower ratings. For example, an item that is consistently rated highly would have a higher weight coefficient associated with them. Similarly, an item that consistently receives below average ratings would have a lower weight coefficient to account for this bias.

The factor terms model interactions between users and items. For example, if a user tends to love romance movies and hate action movies, the factor terms attempt to capture that, causing the model to predict lower scores for action movies and higher scores for romance movies. Learning good weights and factors is controlled by several options outlined below.

More formally, when side data is not present, the predicted score for user \(i\) on item \(j\) is given by

\[\operatorname{score}(i, j) = \mu + w_i + w_j + \mathbf{a}^T \mathbf{x}_i + \mathbf{b}^T \mathbf{y}_j + {\mathbf u}_i^T {\mathbf v}_j,\]

where \(\mu\) is a global bias term, \(w_i\) is the weight term for user \(i\), \(w_j\) is the weight term for item \(j\), \(\mathbf{x}_i\) and \(\mathbf{y}_j\) are respectively the user and item side feature vectors, and \(\mathbf{a}\) and \(\mathbf{b}\) are respectively the weight vectors for those side features. The latent factors, which are vectors of length num_factors, are given by \({\mathbf u}_i\) and \({\mathbf v}_j\).

When binary_target=True, the above score is passed through a logistic function:

\[\operatorname{score}(i, j) = 1 / (1 + exp (- z)),\]

where \(z\) is the original linear score.

Training the model

Formally, the objective function we are optimizing for is:

\[\min_{\mathbf{w}, \mathbf{a}, \mathbf{b}, \mathbf{V}, \mathbf{U}} \frac{1}{|\mathcal{D}|} \sum_{(i,j,r_{ij}) \in \mathcal{D}} \mathcal{L}(\operatorname{score}(i, j), r_{ij}) + \lambda_1 (\lVert {\mathbf w} \rVert^2_2 + || {\mathbf a} ||^2_2 + || {\mathbf b} ||^2_2 ) + \lambda_2 \left(\lVert {\mathbf U} \rVert^2_2 + \lVert {\mathbf V} \rVert^2_2 \right)\]

where \(\mathcal{D}\) is the observation dataset, \(r_{ij}\) is the rating that user \(i\) gave to item \(j\), \({\mathbf U} = ({\mathbf u}_1, {\mathbf u}_2, ...)\) denotes the user’s latent factors and \({\mathbf V} = ({\mathbf v}_1, {\mathbf v}_2, ...)\) denotes the item latent factors. The loss function \(\mathcal{L}(\hat{y}, y)\) is \((\hat{y} - y)^2\) by default. \(\lambda_1\) denotes the linear_regularization parameter and \(\lambda_2\) the regularization parameter.

The model is trained using one of the following solvers:

(a) Stochastic Gradient Descent [sgd] with additional tricks [Bottou] to improve convergence. The optimization is done in parallel over multiple threads. This procedure is inherently random, so different calls to create() may return slightly different models, even with the same random_seed.

(b) Alternating least squares (ALS), where the user latent factors are computed by fixing the item latent factors and vice versa.

The Factorization Machine recommender model approximates target rating values as a weighted combination of user and item latent factors, biases, side features, and their pairwise combinations.

The Factorization Machine [Rendle] is a generalization of Matrix Factorization. In particular, while Matrix Factorization learns latent factors for only the user and item interactions, the Factorization Machine learns latent factors for all variables, including side features, and also allows for interactions between all pairs of variables. Thus the Factorization Machine is capable of modeling complex relationships in the data. Typically, using linear_side_features=True performs better in terms of RMSE, but may require a longer training time.

References

[Koren_et_al]Koren, Yehuda, Robert Bell and Chris Volinsky. “Matrix Factorization Techniques for Recommender Systems.” Computer Volume: 42, Issue: 8 (2009): 30-37.
[sgd]Wikipedia - Stochastic gradient descent
[Bottou]Leon Bottou, “Stochastic Gradient Tricks,” Neural Networks, Tricks of the Trade, Reloaded, 430–445, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
[Rendle]Steffen Rendle, “Factorization Machines,” in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), 2010.

Methods

FactorizationRecommender.evaluate(dataset[, …]) Evaluate the model’s ability to make rating predictions or recommendations.
FactorizationRecommender.evaluate_precision_recall(dataset) Compute a model’s precision and recall scores for a particular dataset.
FactorizationRecommender.evaluate_rmse(…) Evaluate the prediction error for each user-item pair in the given data set.
FactorizationRecommender.export_coreml(filename) Export the model in Core ML format.
FactorizationRecommender.get_num_items_per_user() Get the number of items observed for each user.
FactorizationRecommender.get_num_users_per_item() Get the number of users observed for each item.
FactorizationRecommender.get_similar_items([…]) Get the k most similar items for each item in items.
FactorizationRecommender.get_similar_users([…]) Get the k most similar users for each entry in users.
FactorizationRecommender.predict(dataset[, …]) Return a score prediction for the user ids and item ids in the provided data set.
FactorizationRecommender.recommend([users, …]) Recommend the k highest scored items for each user.
FactorizationRecommender.recommend_from_interactions(…) Recommend the k highest scored items based on the
FactorizationRecommender.save(location) Save the model.
FactorizationRecommender.summary([output]) Print a summary of the model.