sad.model package
Submodules
sad.model.base module
- class ModelBase(config: Dict, task: TrainingTask = None)[source]
Bases:
abc.ABC
The abstract model base class. It is the class that all concrete model classes will inherit from.
- property config: Dict
Configuration information that is used to initialize the instance.
- property metrics: Dict
A dictionary stores metrics of the model. Subject to change during model training by callbacks.
- property s3_key_path: str
A S3 key uniquely assigned to a model instance. Will be setup during model’s instantiation, and populated to
self.spec
. It is the S3 key of the model’s remote store if the model will be pushed to a S3 bucket.
- property spec: Dict
A reference to
"spec"
field inself.config
.
- property task: sad.task.training.TrainingTask
An instance of training task associated with current model. It is the task instance in which a model is initialized.
- property working_dir: str
Alias to
self.task.output_dir
.
- class ModelFactory[source]
Bases:
object
A factory class that is responsible to create model instances.
- logger = <Logger model.ModelFactory (INFO)>
Class attribute for logging.
- Type
logging.Logger
- classmethod produce(config: Dict, task: TrainingTask) sad.model.base.ModelBase [source]
A class method to create instances of
sad.model.ModelBase
.- Parameters
config (
config
) –Configuration used to initialize instance object. An example is given below:
name: SADModel spec: n: 200 m: 500 k: 100
- classmethod register(wrapped_class: sad.model.base.ModelBase) sad.model.base.ModelBase [source]
A class decorator responsible to decorate
sad.model.ModelBase
classes and register them intoModelFactory.registry
.
sad.model.bpr module
- class BPRModel(config: dict, task: TrainingTask = None)[source]
Bases:
sad.model.base.ModelBase
- calculate_preference_tensor()[source]
Calculate preference tensor
self.X
using user and item matrices.
- calculate_probability_tensor()[source]
Calculate probability tensor by applying logistic function to preference tensor
self.X
.
- draw_observation_tensor() numpy.ndarray [source]
Draw a complete observation tensor from the generative model of BPR.
- Returns
Three-way tensor with dimension
n x m x m
representing personalized preferences between item pairs.- Return type
np.ndarray
- get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float [source]
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int
) – The observation at(u_idx, i_idx, j_idx)
. Take1|-1|0
three different values.1
suggestsi_idx
-th item is more preferable thanj_idx
-th item foru_idx
-th user.-1
suggests the opposite.0
means the preference information is not available (missing data).
- Returns
Return
d(p)/d(x_uij)
, the gradient of log likehood with respect tox_uij
, the(u_idx, i_idx, j_idx)
element in preference tensor.- Return type
(
float
)
- get_t_sparsity() float [source]
Extract the number of elements that are close to
1
in item right vectorsself.T
and return proportion.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float [source]
Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no parameter arguments provided.
- Parameters
u_idx (
int
) – User index, from0
toself.n-1
.i_idx (
int
) – Item index, from0
toself.m-1
.j_idx (
int
) – Item index, from0
toself.m-1
.XI (
np.ndarray
) – An optional user matrix. When provided, user vector will be taken from providedXI
instead ofself.XI
.H (
np.ndarray
) – An optional item matrix. When provided, item vector will be taken from providedH
instead ofself.H
.
- Returns
Preference score between
i_idx
-th item andj_idx
-th item foru_idx
-th user.- Return type
float
- gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.g (
float
) – The gradient of log likelihood wrtx_uij
.w_l2 (
float
) – The weight of l2 regularization.w_l1 (
float
) – The weight of l1 regularization.lr (
float
) – Learning rate.
- initialize_params()[source]
Initialize user matrix
self.XI
and item matrixself.H
by drawing entries from a standard normal distribution.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model parameters.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model parameters are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Load model checkpoints.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model parameters are stored.checkpoint_id (
int
) – Model parameters will be loaded from file with name"model-params-{checkpoint_id:05d}.npz"
.
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int
) – The observation at(u_idx, i_idx, j_idx)
. Take1|-1|0
three different values."1"
suggestsi_idx
-th item is more preferable thanj_idx
-th item foru_idx
-th user."-1"
suggests the opposite."0"
means the preference information is not available (missing data).XI (
np.ndarray
) – An optional user matrix. When provided, user vector will be taken from providedXI
instead ofself.XI
.H (
np.ndarray
) – An optional item matrix. When provided, item vector will be taken from providedH
instead ofself.H
.
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property n: int
The number of users.
- parameters_for_monitor() dict [source]
Extract the number of elements that are close to
1
in item right vectorsself.T
and return proportion.
sad.model.cornac module
- class CornacModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- property cornac_model: cornac.models.recommender.Recommender
A model instance object from Cornac package. This model will be initialized via
sad.trainer.CornacTrainer
when calling methodself.initialize_cornac_model()
of this class. This is because some parameters needed to initialize a Cornac model are actually related to trainer specifications. Therefore those parameters need to be passed from trainer.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, **kwargs) float [source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item indices are needed.
- Parameters
u_idx (
int
) – User index, from0
toself.n-1
.i_idx (
int
) – Item index, from0
toself.m-1
.j_idx (
int
) – Item index, from0
toself.m-1
.
- Returns
Preference score between
i_idx
-th item andj_idx
-th item foru_idx
-th user.- Return type
float
- initialize_cornac_model(trainer: CornacTrainer)[source]
Initialize a model object implemented in Cornac package. Some training parameters in a
trainer
object will be needed, therefore asad.trainer.CornacTrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.cornac_model
property will contain the actual model object."cornac_model_name"
field inself.spec
contains the class name that will be used to initialize a Cornac model instance.- Parameters
trainer (
sad.trainer.CornacTrainer
) – A trainer that will call this method to initialize a Cornac model.- Raises
AttributeError – When supplied
"cornac_model_name"
is not an existing Cornac model class inmodels
module from Cornac package.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and some additional information are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int
) – The observation at(u_idx, i_idx, j_idx)
. Take1|-1|0
three different values."1"
suggestsi_idx
-th item is more preferable thanj_idx
-th item foru_idx
-th user."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained Cornac model to a folder (
self.s3_key_path
) rooted atworking_dir
. The actual save operation will be delegated toself.cornac_model.save()
. In the meanwhile, some additional fields defined byADDITIONAL_FIELD_NAMES
macro in this module will be serialized to pickle files in the same folder.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and some additional information will be saved.
sad.model.fm module
- class FMModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- property fm_model: rankfm.rankfm.RankFM
The Factorization Machine (FM) model instance object. We are using the implementation of FM from
RankFM
package. This model will be initialized viasad.trainer.FMTrainer
when calling methodself.initialize_fm_model()
of this class. This is because some paraemters that are required to initialize aRankFM
model are owned by trainer. Therefore those parameters need to be passed from the trainer.
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float [source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (not indices) are needed as arguments.
- Parameters
u_id (
str
) – User ID.i_id (
str
) – Item ID.j_id (
str
) – Item ID.
- Returns
Preference score between item
i_id
andj_id
for useru_id
.- Return type
float
- initialize_fm_model(trainer: FMTrainer)[source]
Initialize a FM model object implemented in package
RankFM
. Some training parameters in atrainer
object will be needed, therefore asad.trainer.FMTrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.fm_model
property will contain the actual model object.- Parameters
trainer (
sad.trainer.FMTrainer
) – A trainer that will call this method to initialize a FM model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and configuration are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_id (
str
) – A user ID.i_id (
str
) – An item ID. The ID of left item in preference tensor.j_id (
str
) – An item ID. The ID of right item in preference tensor.obs_uij (
int
) – The observation of(u_id, i_id, j_id)
from dataset. Take1|-1|0
three different values."1"
suggests itemi_id
is more preferable than itemj_id
for useru_id
."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained FM model to a folder (
self.s3_key_path
) rooted atworking_dir
. The trained FM model (self.fm_model
) will be saved as a pickle file namedmodel.pickle
under the folder.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and its configuration will be saved.
sad.model.msft_ncf module
- class MSFTRecNCFModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float [source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str
) – User ID.i_id (
str
) – Item ID.j_id (
str
) – Item ID.
- Returns
Preference score between item
i_id
andj_id
for useru_id
.- Return type
float
- initialize_msft_ncf_model(trainer: MSFTRecNCFTrainer)[source]
Initialize a
NCF
model object implemented in Python packagerecommenders
. Some training parameters in atrainer
object will be needed, therefore asad.trainer.MSFTRecNCFTrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.msft_ncf_model
property will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecNCFTrainer
) – A trainer that will call this method to initialize a NCF model object.
- property k: int
The number of latent dimentions.
- property layer_sizes: List[int]
The layer sizes of the MLP part of the NCF model. Its value will be read directly from
"layer_sizes"
field inself.spec
. Default to[128]
, a one layer perceptron with 128 nodes.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and configuration are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_id (
str
) – A user ID.i_id (
str
) – An item ID. The ID of left item in preference tensor.j_id (
str
) – An item ID. The ID of right item in preference tensor.obs_uij (
int
) – The observation of(u_id, i_id, j_id)
from dataset. Take1|-1|0
three different values."1"
suggests itemi_id
is more preferable than itemj_id
for useru_id
."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property model_type: str
The type of NCF model that is supported by
"recommenders"
package. Currently could take"MLP|GMF|NeuMF"
. Read directly from"model_type"
field inself.spec
. Default to"NeuMF"
.
- property msft_ncf_model: recommenders.models.ncf.ncf_singlenode.NCF
The Neural Collaborative Filtering (NCF) model instance object. We are using the implementation of NCF from
recommenders
package developed and maintained by Mircrosoft. This model will be initialized viasad.trainer.MSFTRecNCFTrainer
when calling methodself.initialize_msft_ncf_model()
of this class. This is because some parameters required to initialize a NCF model are actually specified in trainer. Therefore those paraemters need to be passed from trainer to this model.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained NCF model to a folder (
self.s3_key_path
) rooted atworking_dir
. The actual saving operation will be delegated toself.msft_ncf_model.save()
. In the meanwhile, some additional information about the model will be saved toadditional_info.json
. Those additional information will be used when loading a trained NCF model.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and its configuration will be saved.
sad.model.msft_rbm module
- class MSFTRecRBMModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- property bh: numpy.ndarray
The bias for hidden unit. The size is
1 x k
. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel
, its value will be loaded too.
- property bv: numpy.ndarray
The bias for visible unit. The size is
1 x m
. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel
, its value will be loaded too.
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float [source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str
) – User ID.i_id (
str
) – Item ID.j_id (
str
) – Item ID.
- Returns
Preference score between item
i_id
andj_id
for useru_id
.- Return type
float
The the number of hidden units in the RBM model. Its value will read directly from
"k"
field inself.spec
.
- initialize_msft_rbm_model(trainer: MSFTRecRBMTrainer)[source]
Initialize a
RBM
model object implemented in Python packagerecommenders
. Some training parameters in atrainer
object will be needed, therefore asad.trainer.MSFTRecRBMTrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.msft_rbm_model
property will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecRBMTrainer
) – A trainer that will call this method to initialize a RBM model object.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder. Need tests to confirm working properly.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and configuration are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_id (
str
) – A user ID.i_id (
str
) – An item ID. The ID of left item in preference tensor.j_id (
str
) – An item ID. The ID of right item in preference tensor.obs_uij (
int
) – The observation of(u_id, i_id, j_id)
from dataset. Take1|-1|0
three different values."1"
suggests itemi_id
is more preferable than itemj_id
for useru_id
."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property msft_rbm_model: recommenders.models.rbm.rbm.RBM
The Restricted Boltzmann Machine (RBM) model instance object. We are using the implementation of RBM from
recommenders
package developed and maintained by Mircrosoft. This model will be initialized viasad.trainer.MSFTRecRBMTrainer
when calling methodself.initialize_msft_rbm_model()
of this class. This is because some parameters that are required to initialize a RBM model are actually specified in its trainer.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained RBM model to a folder (
self.s3_key_path
) rooted atworking_dir
. The three parameters in the RBM are first converted to numpy arrays, and then saved to fileweights.npz
in the folder ofos.path.join(self.s3_key_path, working_dir)
.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and its configuration will be saved.
- save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- property w: numpy.ndarray
The weight in RBM model. The size is in
m x k
. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel
, its value will be loaded too.
sad.model.msft_vae module
- class MSFTRecVAEModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- initialize_msft_vae_model(trainer: MSFTRecVAETrainer)[source]
Initialize a VAE model object implemented in package
recommenders
. Some training parameters in atrainer
object will be needed, therefore asad.trainer.MSFTRecVAETrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling,self.msft_vae_model
property will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecVAETrainer
) – A trainer that will call this method to initialize a VAE model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and configuration are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Haven’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_id (
str
) – A user ID.i_id (
str
) – An item ID. The ID of left item in preference tensor.j_id (
str
) – An item ID. The ID of right item in preference tensor.obs_uij (
int
) – The observation of(u_id, i_id, j_id)
from dataset. Take1|-1|0
three different values."1"
suggests itemi_id
is more preferable than itemj_id
for useru_id
."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property msft_vae_model: recommenders.models.vae.standard_vae.StandardVAE
Variational Auto Encoder (VAE) model instance object. We are using the implementation of VAE from
recommenders
package developed and maintained by MSFT. This model will be initialized viasad.trainer.VAETrainer
when calling methodself.initialize_msft_vae_model()
of this class. This is because some parameters that are required to initialize a VAE model are actually specified in its trainer.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained VAE model to a folder (
self.s3_key_path
) rooted atworking_dir
. The actual saving operation will be delegated toself.msft_vae_model.model.save()
.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and its configuration will be saved.
sad.model.sad module
- class SADModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- property T_ceiling: float
The largest value of T that is allowed.
- calculate_preference_tensor()[source]
Calculate preference tensor
self.X
using user and item matrices.
- calculate_probability_tensor()[source]
Calculate probability tensor by applying logistic function to preference tensor
self.X
.
- draw_observation_tensor() numpy.ndarray [source]
Draw a complete observation tensor from the generative model of SAD.
- Returns
Three-way tensor with dimension
n x m x m
representing personalized preferences between item pairs.- Return type
np.ndarray
- get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float [source]
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int
) – The observation at(u_idx, i_idx, j_idx)
. Take1|-1|0
three different values."1"
suggestsi_idx
-th item is more preferable thanj_idx
-th item foru_idx
-th user."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return
d(p)/d(x_uij)
, the gradient of log likehood with respect tox_uij
, the(u_idx, i_idx, j_idx)
element in preference tensor.- Return type
(
float
)
- get_t_sparsity() float [source]
Extract the number of elements that are close to
1
in item right vectorsself.T
and return proportion. Whenself.inner_flag
isTrue
, it is exponentiation ofself.T
will be used to calculate this number.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float [source]
Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no additional parameters are provided as arguments.
- Parameters
u_idx (
int
) – User index, from0
toself.n-1
.i_idx (
int
) – Item index, from0
toself.m-1
.j_idx (
int
) – Item index, from0
toself.m-1
.XI (
np.ndarray
) – An optional user matrix. When provided, user vector will be taken from providedXI
instead ofself.XI
.H (
np.ndarray
) – An optional left item matrix. When provided, left item vector will be taken from providedH
instead ofself.H
.T (
np.ndarray
) – An optional right item matrix. When provided, right item vector will be taken from providedT
instead ofself.T
. Subject to exponentiation whenself.inner_flag
isTrue
.
- Returns
Preference score between
i_idx
-th item andj_idx
-th item foru_idx
-th user.- Return type
float
- gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.g (
float
) – The gradient of log likelihood wrtx_uij
.w_l2 (
float
) – The weight of l2 regularization.w_l1 (
float
) – The weight of l1 regularization.lr (
float
) – Learning rate.
- initialize_params()[source]
Initialize user matrix
self.XI
, left item matrixself.H
and right item matrixself.T
by drawing entries from a standard normal distribution. When right item matrix is assumed to be non-negative (self.inner_flag
isTrue
),self.T
will be storing the logrithm of true tau matrix.
- property inner_flag: bool
Whether right matrix will be non-negative.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model parameters.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model parameters are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Load model checkpoints.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model parameters are stored.checkpoint_id (
int
) – Model parameters will be loaded from file with name"model-params-{checkpoint_id:05d}.npz"
.
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_idx (
int
) – Index of user in user set. 0-based.i_idx (
int
) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int
) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int
) – The observation at(u_idx, i_idx, j_idx)
. Take1|-1|0
three different values."1"
suggestsi_idx
-th item is more preferable thanj_idx
-th item foru_idx
-th user."-1"
suggests the opposite."0"
means the preference information is not available (missing data).XI (
np.ndarray
) – An optional user matrix. When provided, user vector will be taken from providedXI
instead ofself.XI
.H (
np.ndarray
) – An optional left item matrix. When provided, left item vector will be taken from providedH
instead ofself.H
.T (
np.ndarray
) – An optional right item matrix. When provided, right item vector will be taken from providedT
instead ofself.T
. Subject to exponentiation whenself.inner_flag
is set toTrue
.
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property n: int
The number of users.
- parameters_for_monitor() dict [source]
Extract the number of elements that are close to
1
in item right vectorsself.T
and return proportion. Whenself.inner_flag
isTrue
, it is exponentiation ofself.T
will be used to calculate this number.
sad.model.svd module
- class SVDModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float [source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str
) – User ID.i_id (
str
) – Item ID.j_id (
str
) – Item ID.
- Returns
Preference score between item
i_id
andj_id
for useru_id
.- Return type
float
- initialize_svd_model(trainer: SVDTrainer)[source]
Initialize a SVD model object implemented in package
surprise
. Some training parameters in atrainer
object will be needed, therefore asad.trainer.SVDTrainer
object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling,self.svd_model
property will contain the actual model object.- Parameters
trainer (
sad.trainer.SVDTrainer
) – A trainer that will call this method to initialize a SVD model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and configuration are stored.filename (
str
) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename)
.
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float [source]
Calculate log likelihood.
- Parameters
u_id (
str
) – A user ID.i_id (
str
) – An item ID. The ID of left item in preference tensor.j_id (
str
) – An item ID. The ID of right item in preference tensor.obs_uij (
int
) – The observation of(u_id, i_id, j_id)
from dataset. Take1|-1|0
three different values."1"
suggests itemi_id
is more preferable than itemj_id
for useru_id
."-1"
suggests the opposite."0"
means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id)
. Return0
when the observation is missing.- Return type
(
float
)
- property m: int
The number of items.
- property n: int
The number of users.
- property prediction_cache: Dict[Tuple[str, str], float]
A dictionary contains the prediction cache. The key is a user id and item id pair, and value is model’s prediction.
- save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]
Save trained SVD model to a folder (
self.s3_key_path
) rooted atworking_dir
. The model objectself.svd_model
will be saved as a pickle file namedmodel.pickle
in the folder.Model configuration (
self.config
) will be saved too.- Parameters
working_dir (
str
) – The containing folder ofself.s3_key_path
where model and its configuration will be saved.
- save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- property svd_model: surprise.prediction_algorithms.matrix_factorization.SVD
Singular Value Decomposition (SVD) model instance object. We are using the implementation of SVD from
surprise
package. This model will be initialized viasad.trainer.SVDTrainer
when calling methodself.initialize_svd_model()
of this class.