Choosing a Model
In this section, we give some intuition for which modeling choices you may make depending on your data and your task. Each recommender model in Turi Create has certain strengths that fit well with certain types of data and different objectives.
The easiest way to choose a model is to let Turi Create choose your model for you. This is done by simply using the default recommender.create function, which chooses the model based on the data provided to it. As an example, the following code creates a basic item similarity model and then generates recommendations for each user in the dataset:
m = turicreate.recommender.create(data, user_id='user', item_id='movie')
recs = m.recommend()
Using the default create method provides an excellent way to quickly get a recommender model up and running, but in many cases it's desirable to have more control over the process.
Effectively choosing and tuning a recommender model is best done in two stages. The first stage is to match the type of data up with the correct model or models, and the second stage is to correctly evaluate and tune the model(s) and assess their accuracy. Sometimes one model works better and sometimes another, depending on the data set. In a later section, we'll look at evaluating a model so you can be confident you chose the best one.
Data Type: Explicit, Implicit, or Item Content Data?
With Explicit data, there is an associated target column that gives a score for each interaction between a users and an items. An example of this type would be a data set of user's ratings for movies or books. With this type of data, the objective is typically to either predict which new items a user would rate highly or to predict a user's rating on a given item.
Implicit data does not include any rating information. In this case, a dataset may have just two columns -- user ID and item ID. For this type of data, the recommendations are based on which items are similar to the items a user has interacted with.
The third type of data that Turi Create can use to build a recommender system is item content data. In this case, information associated with each individual item, instead of the user interaction patterns, is used to recommend items similar to a collection of items in a query set. For example, item content could be a text description of an item, a set of key words, an address, categories, or even a list of similar items taken from another model.
Working with Explicit Data
If your data is explicit, i.e., the observations include an actual rating given by the user, then the model you wish to use depends on whether you want to predict the rating a user would give a particular item, or if you want the model to recommend items that it believes the user would rate highly.
If you have ratings data and care about accurately predicting the rating
a user would give a specific item , then we typically recommend you use
the factorization_recommender
. In this model the observed ratings are
modeled as a weighted combination of terms, where the weights (along
with some of the terms, also known as factors) are learned from data.
All of these models can easily incorporate user or item side features.
A linear model assumes that the rating is a linear combination of user
features, item features, user bias, and item popularity bias. The
factorization_recommender
goes one step further and allows each rating
to also depend on a term representing the inner product of two vectors,
one representing the user's affinity to a set of latent preference
modes, and one representing the item's affinity to these modes. These
are commonly called latent factors and are automatically learned from
observation data. When side data is available, the model allows for
interaction terms between these learned latent factors and all the side
features. As a rule of thumb, the presence of side data can make the
model more finicky to learn (due to its power and flexibility).
If you care about ranking performance, instead of simply predicting
the rating accurately, then choose
ItemSimilarityRecommender
or
RankingFactorizationRecommender.
With rating data, the item_similarity_recommender
model scores items
based on how likely they predict the user will rate them highly, but the
absolute values of the predicted scores may not match up with the actual
ratings a user would give the item.
The
RankingFactorizationRecommender
tries to recommend items that are both similar to the items in a user's
dataset and, if rating information is provided, those that would be
rated highly by the user. It tends to predict ratings with less
accuracy than the non-ranking factorization_recommender
, but it tends
to do much better at choosing items that a user would rate highly. This
is because it also penalizes the predicted rating of items that are
significantly different from the items a user has interacted with. In
other words, it only predicts a high rating for user-item pairs in which
it predicts a high rating and is confident in that prediction.
Furthermore, this model works particularly well when the target ratings
are binary, i.e., if they come from thumbs up/thumbs down flags. In this
case, use the input parameter binary_targets = True
.
When a target
column is provided, the model returned by the default
recommender.create
function is a matrix factorization model. The
matrix factorization model can also be called directly with
ranking_factorization_recommender.create.
When using the model-specific create function, other arguments can be
provided to better tune the model, such as num_factors
or
regularization
. See the documentation on
RankingFactorizationRecommender
for more information.
m = turicreate.ranking_factorization_recommender.create(data,
user_id='user',
item_id='movie',
target='rating')
Working with Implicit Data
The goal a recommender system built with implicit data is to recommend items that are similar to those similar to the collection of items a user has interacted with. "Similar" in this case is determined by other user interactions -- if most users with similar behavior to a given user also interacted with a item the given user had not, that item would likely in the given user's recommendations.
In this case, the default recommender.create
function in the example
code above code returns an
ItemSimilarityRecommender
which computes the similarity between each pair of items and recommends
items to each user that are closest to items they have already used or
liked:
m = turicreate.item_similarity_recommender.create(data,
user_id='user',
item_id='movie')
The ranking_factorization_recommender
is also great for implicit data,
and can be called the same way:
m = turicreate.ranking_factorization_recommender.create(data,
user_id='user',
item_id='movie')
With implicit data, the ranking factorization model has two solvers, one
which uses a randomized sgd-based method to tune the results, and the
other which uses an implicit form of alternating least squares (iALS).
The default sgd-based method samples unobserved items along with the
observed ones, then treats them as negative examples. This is the
default solver. Implicit ALS is a version of the popular Alternating
Least Squares (ALS) algorithm that attempts to find factors that
distinguish between the given user-item pairs and all other negative
examples. This algorithm can be faster than the sgd method,
particularly if there are many items, but it does not currently support
side features. This solver can be activated by passing in solver =
"ials"
to ranking_factorization_recommender.create
. On some
datasets, one of these solvers can yield better precision-recall scores
than the item_similarity_recommender
.
Item Content Data
The ItemContentRecommender builds a model similar to the item similarity model, but uses similarities between item content to actually build the model. In this model, the similarity score between two items is calculated by first computing the similarity between the item data for each column, then taking a weighted average of the per-column similarities to get the final similarity. The recommendations are generated according to the average similarity of a candidate item to all the items in a user's set of rated items. This model can be created without observation data about user-item interactions, in which case such information must be passed in at recommend time in order to make recommendations.
Note that in most situations, the similarity patterns of items
can be inferred effectively from patterns in the user interaction
data, and the factorization_recommender
and
item_similarity_recommender
do this effectively. However,
leveraging information about item content can be very useful,
particularly when the user-item interaction data is sparse or not
known until recommend time.
Side information for users, items, and observations
In many cases, additional information about the users or items can improve the quality of the recommendations. For example, including information about the genre and year of a movie can be useful information in recommending movies. We call this type of information user side data or item side data depending on whether it goes with the user or the item.
Including side data is easy with the user_data
or item_data
parameters to the recommender.create()
function. These arguments
are SFrames and must have a user or item column that corresponds to
the user_id
and item_id
columns in the observation data. Internally,
the data is joined to the particular user or item when training the
model, the data is saved with the model and also used to make
recommendations.
In particular, the
FactorizationRecommender
and the
RankingFactorizationRecommender
both incorporate the side data into the prediction through additional
interaction terms between the user, the item, and the side feature. For
the actual formula, see the API docs for the
FactorizationRecommender.
Both of these models also allow you to obtain the parameters that have
been learned for each of the side features via the m['coefficients']
argument.
Side data may also be provided for each observation. For example, it might be useful to have recommendations change based on the time at which the query is being made. To do so, you could create a model using an SFrame that contains a time column, in addition to a user and item column. For example, a "time" column could include a string indicating the hour; this will be treated as a categorical variable and the model will learn a latent factor for each unique hour.
# sf has columns: user_id, item_id, time
m = tc.ranking_factorization_recommender.create(sf)
In order to include this information when requesting observations, you
may include the desired additional data as columns in an SFrame for the
users
argument to m.recommend()
. In our example above, when querying
for recommendations, you would include the time that you want to use for
each set of recommendations.
users_query = tc.SFrame({'user_id': [1, 2, 3], 'time': ['10pm', '10pm', '11pm']})
m.recommend(users=user_query)
In this case, recommendations for user 1 and 2 would use the parameters learned from observations that occurred at 10pm, whereas the recommendations for user 3 would incorporate parameters corresponding to 11pm. For more details, check out recommend in the API docs.
You may check the number of columns used as side information by querying
m['observation_column_names']
, m['user_side_data_column_names']
, and
m['item_side_data_column_names']
. By printing the model, you can also
see this information. In the following model, we had four columns in the
observation data (two of which were user_id
and item_id
) and four
columns in the SFrame passed to item_side_data
(one of which was
item_id
):
Class : RankingFactorizationRecommender
Schema
------
User ID : user_id
Item ID : item_id
Target : None
Additional observation features : 2
Number of user side features : 0
Number of item side features : 3
If new side data exists when recommendations are desired, this can be
passed in via the new_observation_data
, new_user_data
, and
new_item_data
arguments. Any data provided there will take precedence
over the user and item side data stored with the model.
Not all of the models make use of side data: the
popularity_recommender
and item_similarity_recommender
create
methods currently do not use it.
Suggested pre-processing techniques
Lastly, here are a couple of common data issues that can affect the
performance of a recommender. First, if the observation data is very
sparse, i.e., contains only one or two observations for a large number
of users, then none of the models will perform much better than the
simple baselines available via the popularity_recommender
. In this
case, it might help to prune out the rare users and rare items and try
again. Also, re-examine the data collection and data cleaning process
to see if mistakes were made. Try to get more observation data per user
and per item, if you can.
Another issue often occurs when usage data is treated as ratings. Unlike explicit ratings that lie on a nice linear interval, say 0 to 5, usage data can be badly skewed. For instance, in the Million Song dataset, one user played a song more than 16,000 times. All the models would have a difficult time fitting to such a badly skewed target. The fix is to bucketize the usage data. For instance, any play count greater than 50 can be mapped to the maximum rating of 5. You can also clip the play counts to be binary, e.g., any number greater than 2 is mapped to 1, otherwise it's 0.
Evaluating Model Performance
When trying out different recommender models, it's critical to have a principled way of evaluating their performance. The standard approach to this is to split the observation data into two parts, a training set and a test set. The model is trained on the training set, and then evaluated on the test set -- evaluating your model on the same dataset that it was trained on gives a very bad idea of how well it will perform in practice. Once the model type and associated parameters are chosen, the model can be trained on the full dataset.
With recommender systems, we can evaluate models using two different metrics, RMSE and precision-recall. RMSE measures how well the model predicts the score of the user, while precision-recall measures how well the recommend() function recommends items that the user also chooses. For example, the best RMSE value is when the model exactly predicts the value of all the ratings in the test set. Similarly, the best precision-recall value is when the user has 5 items in the test set and recommend() recommends exactly those 5 items. While both can be important depending on the type of data and desired task, precision-recall is often more useful in evaluating how well a recommender system will perform in practice.
The Turi Create recommender toolkit includes a function,
tc.recommender.random_split_by_user,
to easily generate training and test sets from observation data.
Unlike tc.SFrame.random_split
, it only puts data for a subset of the
users into the test set. This is typically sufficient for evaluating
recommender systems.
tc.recommender.random_split_by_user
generates a test set by first
choosing a subset of the users at random, then choosing a random
subset of that user's items. By default, it chooses 1000 users and,
for each of these users, 20% of their items on average. Note that not
all users may be represented by the test set, as some users may not
have any of their items randomly selected for the test set.
Once training and test set are generated, the tc.recommender.util.compare_models function allows easy evaluation of several models using either RMSE or precision-recall. These models may the same models with different parameters or completely different types of model.
The Turi Create recommender toolkit provides several ways of
working with rating data while ensuring good precision-recall. To
accurately evaluate the precision-recall of a model trained on explicit
rating data, it's important to only include highly rated items in your
test set as these are the items a user would likely choose. Creating
such a test set can be done with a handful of SFrame operations and
tc.recommender.random_split_by_user
:
high_rated_data = data[data["rating"] >= 4]
low_rated_data = data[data["rating"] < 4]
train_data_1, test_data = tc.recommender.random_split_by_user(
high_rated_data, user_id='user', item_id='movie')
train_data = train_data_1.append(low_rated_data)
Other examples of comparing models can be found in the API documentation for tc.recommender.util.compare_models.