sad.model package
Submodules
sad.model.base module
- class ModelBase(config: Dict, task: TrainingTask = None)[source]
Bases:
abc.ABCThe abstract model base class. It is the class that all concrete model classes will inherit from.
- property config: Dict
Configuration information that is used to initialize the instance.
- property metrics: Dict
A dictionary stores metrics of the model. Subject to change during model training by callbacks.
- property s3_key_path: str
A S3 key uniquely assigned to a model instance. Will be setup during model’s instantiation, and populated to
self.spec. It is the S3 key of the model’s remote store if the model will be pushed to a S3 bucket.
- property spec: Dict
A reference to
"spec"field inself.config.
- property task: sad.task.training.TrainingTask
An instance of training task associated with current model. It is the task instance in which a model is initialized.
- property working_dir: str
Alias to
self.task.output_dir.
- class ModelFactory[source]
Bases:
objectA factory class that is responsible to create model instances.
- logger = <Logger model.ModelFactory (INFO)>
Class attribute for logging.
- Type
logging.Logger
- classmethod produce(config: Dict, task: TrainingTask) sad.model.base.ModelBase[source]
A class method to create instances of
sad.model.ModelBase.- Parameters
config (
config) –Configuration used to initialize instance object. An example is given below:
name: SADModel spec: n: 200 m: 500 k: 100
- classmethod register(wrapped_class: sad.model.base.ModelBase) sad.model.base.ModelBase[source]
A class decorator responsible to decorate
sad.model.ModelBaseclasses and register them intoModelFactory.registry.
sad.model.bpr module
- class BPRModel(config: dict, task: TrainingTask = None)[source]
Bases:
sad.model.base.ModelBase- calculate_preference_tensor()[source]
Calculate preference tensor
self.Xusing user and item matrices.
- calculate_probability_tensor()[source]
Calculate probability tensor by applying logistic function to preference tensor
self.X.
- draw_observation_tensor() numpy.ndarray[source]
Draw a complete observation tensor from the generative model of BPR.
- Returns
Three-way tensor with dimension
n x m x mrepresenting personalized preferences between item pairs.- Return type
np.ndarray
- get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float[source]
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int) – The observation at(u_idx, i_idx, j_idx). Take1|-1|0three different values.1suggestsi_idx-th item is more preferable thanj_idx-th item foru_idx-th user.-1suggests the opposite.0means the preference information is not available (missing data).
- Returns
Return
d(p)/d(x_uij), the gradient of log likehood with respect tox_uij, the(u_idx, i_idx, j_idx)element in preference tensor.- Return type
(
float)
- get_t_sparsity() float[source]
Extract the number of elements that are close to
1in item right vectorsself.Tand return proportion.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float[source]
Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no parameter arguments provided.
- Parameters
u_idx (
int) – User index, from0toself.n-1.i_idx (
int) – Item index, from0toself.m-1.j_idx (
int) – Item index, from0toself.m-1.XI (
np.ndarray) – An optional user matrix. When provided, user vector will be taken from providedXIinstead ofself.XI.H (
np.ndarray) – An optional item matrix. When provided, item vector will be taken from providedHinstead ofself.H.
- Returns
Preference score between
i_idx-th item andj_idx-th item foru_idx-th user.- Return type
float
- gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.g (
float) – The gradient of log likelihood wrtx_uij.w_l2 (
float) – The weight of l2 regularization.w_l1 (
float) – The weight of l1 regularization.lr (
float) – Learning rate.
- initialize_params()[source]
Initialize user matrix
self.XIand item matrixself.Hby drawing entries from a standard normal distribution.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model parameters.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model parameters are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Load model checkpoints.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model parameters are stored.checkpoint_id (
int) – Model parameters will be loaded from file with name"model-params-{checkpoint_id:05d}.npz".
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int) – The observation at(u_idx, i_idx, j_idx). Take1|-1|0three different values."1"suggestsi_idx-th item is more preferable thanj_idx-th item foru_idx-th user."-1"suggests the opposite."0"means the preference information is not available (missing data).XI (
np.ndarray) – An optional user matrix. When provided, user vector will be taken from providedXIinstead ofself.XI.H (
np.ndarray) – An optional item matrix. When provided, item vector will be taken from providedHinstead ofself.H.
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property n: int
The number of users.
- parameters_for_monitor() dict[source]
Extract the number of elements that are close to
1in item right vectorsself.Tand return proportion.
sad.model.cornac module
- class CornacModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- property cornac_model: cornac.models.recommender.Recommender
A model instance object from Cornac package. This model will be initialized via
sad.trainer.CornacTrainerwhen calling methodself.initialize_cornac_model()of this class. This is because some parameters needed to initialize a Cornac model are actually related to trainer specifications. Therefore those parameters need to be passed from trainer.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, **kwargs) float[source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item indices are needed.
- Parameters
u_idx (
int) – User index, from0toself.n-1.i_idx (
int) – Item index, from0toself.m-1.j_idx (
int) – Item index, from0toself.m-1.
- Returns
Preference score between
i_idx-th item andj_idx-th item foru_idx-th user.- Return type
float
- initialize_cornac_model(trainer: CornacTrainer)[source]
Initialize a model object implemented in Cornac package. Some training parameters in a
trainerobject will be needed, therefore asad.trainer.CornacTrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.cornac_modelproperty will contain the actual model object."cornac_model_name"field inself.speccontains the class name that will be used to initialize a Cornac model instance.- Parameters
trainer (
sad.trainer.CornacTrainer) – A trainer that will call this method to initialize a Cornac model.- Raises
AttributeError – When supplied
"cornac_model_name"is not an existing Cornac model class inmodelsmodule from Cornac package.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and some additional information are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int) – The observation at(u_idx, i_idx, j_idx). Take1|-1|0three different values."1"suggestsi_idx-th item is more preferable thanj_idx-th item foru_idx-th user."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained Cornac model to a folder (
self.s3_key_path) rooted atworking_dir. The actual save operation will be delegated toself.cornac_model.save(). In the meanwhile, some additional fields defined byADDITIONAL_FIELD_NAMESmacro in this module will be serialized to pickle files in the same folder.Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and some additional information will be saved.
sad.model.fm module
- class FMModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- property fm_model: rankfm.rankfm.RankFM
The Factorization Machine (FM) model instance object. We are using the implementation of FM from
RankFMpackage. This model will be initialized viasad.trainer.FMTrainerwhen calling methodself.initialize_fm_model()of this class. This is because some paraemters that are required to initialize aRankFMmodel are owned by trainer. Therefore those parameters need to be passed from the trainer.
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (not indices) are needed as arguments.
- Parameters
u_id (
str) – User ID.i_id (
str) – Item ID.j_id (
str) – Item ID.
- Returns
Preference score between item
i_idandj_idfor useru_id.- Return type
float
- initialize_fm_model(trainer: FMTrainer)[source]
Initialize a FM model object implemented in package
RankFM. Some training parameters in atrainerobject will be needed, therefore asad.trainer.FMTrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.fm_modelproperty will contain the actual model object.- Parameters
trainer (
sad.trainer.FMTrainer) – A trainer that will call this method to initialize a FM model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and configuration are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_id (
str) – A user ID.i_id (
str) – An item ID. The ID of left item in preference tensor.j_id (
str) – An item ID. The ID of right item in preference tensor.obs_uij (
int) – The observation of(u_id, i_id, j_id)from dataset. Take1|-1|0three different values."1"suggests itemi_idis more preferable than itemj_idfor useru_id."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained FM model to a folder (
self.s3_key_path) rooted atworking_dir. The trained FM model (self.fm_model) will be saved as a pickle file namedmodel.pickleunder the folder.Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and its configuration will be saved.
sad.model.msft_ncf module
- class MSFTRecNCFModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str) – User ID.i_id (
str) – Item ID.j_id (
str) – Item ID.
- Returns
Preference score between item
i_idandj_idfor useru_id.- Return type
float
- initialize_msft_ncf_model(trainer: MSFTRecNCFTrainer)[source]
Initialize a
NCFmodel object implemented in Python packagerecommenders. Some training parameters in atrainerobject will be needed, therefore asad.trainer.MSFTRecNCFTrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.msft_ncf_modelproperty will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecNCFTrainer) – A trainer that will call this method to initialize a NCF model object.
- property k: int
The number of latent dimentions.
- property layer_sizes: List[int]
The layer sizes of the MLP part of the NCF model. Its value will be read directly from
"layer_sizes"field inself.spec. Default to[128], a one layer perceptron with 128 nodes.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and configuration are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_id (
str) – A user ID.i_id (
str) – An item ID. The ID of left item in preference tensor.j_id (
str) – An item ID. The ID of right item in preference tensor.obs_uij (
int) – The observation of(u_id, i_id, j_id)from dataset. Take1|-1|0three different values."1"suggests itemi_idis more preferable than itemj_idfor useru_id."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property model_type: str
The type of NCF model that is supported by
"recommenders"package. Currently could take"MLP|GMF|NeuMF". Read directly from"model_type"field inself.spec. Default to"NeuMF".
- property msft_ncf_model: recommenders.models.ncf.ncf_singlenode.NCF
The Neural Collaborative Filtering (NCF) model instance object. We are using the implementation of NCF from
recommenderspackage developed and maintained by Mircrosoft. This model will be initialized viasad.trainer.MSFTRecNCFTrainerwhen calling methodself.initialize_msft_ncf_model()of this class. This is because some parameters required to initialize a NCF model are actually specified in trainer. Therefore those paraemters need to be passed from trainer to this model.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained NCF model to a folder (
self.s3_key_path) rooted atworking_dir. The actual saving operation will be delegated toself.msft_ncf_model.save(). In the meanwhile, some additional information about the model will be saved toadditional_info.json. Those additional information will be used when loading a trained NCF model.Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and its configuration will be saved.
sad.model.msft_rbm module
- class MSFTRecRBMModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- property bh: numpy.ndarray
The bias for hidden unit. The size is
1 x k. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel, its value will be loaded too.
- property bv: numpy.ndarray
The bias for visible unit. The size is
1 x m. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel, its value will be loaded too.
- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str) – User ID.i_id (
str) – Item ID.j_id (
str) – Item ID.
- Returns
Preference score between item
i_idandj_idfor useru_id.- Return type
float
The the number of hidden units in the RBM model. Its value will read directly from
"k"field inself.spec.
- initialize_msft_rbm_model(trainer: MSFTRecRBMTrainer)[source]
Initialize a
RBMmodel object implemented in Python packagerecommenders. Some training parameters in atrainerobject will be needed, therefore asad.trainer.MSFTRecRBMTrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling,self.msft_rbm_modelproperty will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecRBMTrainer) – A trainer that will call this method to initialize a RBM model object.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder. Need tests to confirm working properly.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and configuration are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_id (
str) – A user ID.i_id (
str) – An item ID. The ID of left item in preference tensor.j_id (
str) – An item ID. The ID of right item in preference tensor.obs_uij (
int) – The observation of(u_id, i_id, j_id)from dataset. Take1|-1|0three different values."1"suggests itemi_idis more preferable than itemj_idfor useru_id."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property msft_rbm_model: recommenders.models.rbm.rbm.RBM
The Restricted Boltzmann Machine (RBM) model instance object. We are using the implementation of RBM from
recommenderspackage developed and maintained by Mircrosoft. This model will be initialized viasad.trainer.MSFTRecRBMTrainerwhen calling methodself.initialize_msft_rbm_model()of this class. This is because some parameters that are required to initialize a RBM model are actually specified in its trainer.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained RBM model to a folder (
self.s3_key_path) rooted atworking_dir. The three parameters in the RBM are first converted to numpy arrays, and then saved to fileweights.npzin the folder ofos.path.join(self.s3_key_path, working_dir).Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and its configuration will be saved.
- save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- property w: numpy.ndarray
The weight in RBM model. The size is in
m x k. It’s value will be initialized to zero. When loading a pre-trainedMSFTRecRBMModel, its value will be loaded too.
sad.model.msft_vae module
- class MSFTRecVAEModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- initialize_msft_vae_model(trainer: MSFTRecVAETrainer)[source]
Initialize a VAE model object implemented in package
recommenders. Some training parameters in atrainerobject will be needed, therefore asad.trainer.MSFTRecVAETrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling,self.msft_vae_modelproperty will contain the actual model object.- Parameters
trainer (
sad.trainer.MSFTRecVAETrainer) – A trainer that will call this method to initialize a VAE model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and configuration are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Haven’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_id (
str) – A user ID.i_id (
str) – An item ID. The ID of left item in preference tensor.j_id (
str) – An item ID. The ID of right item in preference tensor.obs_uij (
int) – The observation of(u_id, i_id, j_id)from dataset. Take1|-1|0three different values."1"suggests itemi_idis more preferable than itemj_idfor useru_id."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property msft_vae_model: recommenders.models.vae.standard_vae.StandardVAE
Variational Auto Encoder (VAE) model instance object. We are using the implementation of VAE from
recommenderspackage developed and maintained by MSFT. This model will be initialized viasad.trainer.VAETrainerwhen calling methodself.initialize_msft_vae_model()of this class. This is because some parameters that are required to initialize a VAE model are actually specified in its trainer.
- property n: int
The number of users.
- save(working_dir: Optional[str] = None)[source]
Save trained VAE model to a folder (
self.s3_key_path) rooted atworking_dir. The actual saving operation will be delegated toself.msft_vae_model.model.save().Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and its configuration will be saved.
sad.model.sad module
- class SADModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- property T_ceiling: float
The largest value of T that is allowed.
- calculate_preference_tensor()[source]
Calculate preference tensor
self.Xusing user and item matrices.
- calculate_probability_tensor()[source]
Calculate probability tensor by applying logistic function to preference tensor
self.X.
- draw_observation_tensor() numpy.ndarray[source]
Draw a complete observation tensor from the generative model of SAD.
- Returns
Three-way tensor with dimension
n x m x mrepresenting personalized preferences between item pairs.- Return type
np.ndarray
- get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float[source]
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int) – The observation at(u_idx, i_idx, j_idx). Take1|-1|0three different values."1"suggestsi_idx-th item is more preferable thanj_idx-th item foru_idx-th user."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return
d(p)/d(x_uij), the gradient of log likehood with respect tox_uij, the(u_idx, i_idx, j_idx)element in preference tensor.- Return type
(
float)
- get_t_sparsity() float[source]
Extract the number of elements that are close to
1in item right vectorsself.Tand return proportion. Whenself.inner_flagisTrue, it is exponentiation ofself.Twill be used to calculate this number.
- get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float[source]
Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no additional parameters are provided as arguments.
- Parameters
u_idx (
int) – User index, from0toself.n-1.i_idx (
int) – Item index, from0toself.m-1.j_idx (
int) – Item index, from0toself.m-1.XI (
np.ndarray) – An optional user matrix. When provided, user vector will be taken from providedXIinstead ofself.XI.H (
np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from providedHinstead ofself.H.T (
np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from providedTinstead ofself.T. Subject to exponentiation whenself.inner_flagisTrue.
- Returns
Preference score between
i_idx-th item andj_idx-th item foru_idx-th user.- Return type
float
- gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.g (
float) – The gradient of log likelihood wrtx_uij.w_l2 (
float) – The weight of l2 regularization.w_l1 (
float) – The weight of l1 regularization.lr (
float) – Learning rate.
- initialize_params()[source]
Initialize user matrix
self.XI, left item matrixself.Hand right item matrixself.Tby drawing entries from a standard normal distribution. When right item matrix is assumed to be non-negative (self.inner_flagisTrue),self.Twill be storing the logrithm of true tau matrix.
- property inner_flag: bool
Whether right matrix will be non-negative.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model parameters.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model parameters are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Load model checkpoints.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model parameters are stored.checkpoint_id (
int) – Model parameters will be loaded from file with name"model-params-{checkpoint_id:05d}.npz".
- log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_idx (
int) – Index of user in user set. 0-based.i_idx (
int) – Index of i-th item. It is the idx of left item in preference tensor.j_idx (
int) – Index of j-th item. It is the idx of right item in preference tensor.obs_uij (
int) – The observation at(u_idx, i_idx, j_idx). Take1|-1|0three different values."1"suggestsi_idx-th item is more preferable thanj_idx-th item foru_idx-th user."-1"suggests the opposite."0"means the preference information is not available (missing data).XI (
np.ndarray) – An optional user matrix. When provided, user vector will be taken from providedXIinstead ofself.XI.H (
np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from providedHinstead ofself.H.T (
np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from providedTinstead ofself.T. Subject to exponentiation whenself.inner_flagis set toTrue.
- Returns
Return the contribution to the log likelihood from observation at
(u_idx, i_idx, j_idx). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property n: int
The number of users.
- parameters_for_monitor() dict[source]
Extract the number of elements that are close to
1in item right vectorsself.Tand return proportion. Whenself.inner_flagisTrue, it is exponentiation ofself.Twill be used to calculate this number.
sad.model.svd module
- class SVDModel(config: dict, task: TrainingTask)[source]
Bases:
sad.model.base.ModelBase- get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]
Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.
- Parameters
u_id (
str) – User ID.i_id (
str) – Item ID.j_id (
str) – Item ID.
- Returns
Preference score between item
i_idandj_idfor useru_id.- Return type
float
- initialize_svd_model(trainer: SVDTrainer)[source]
Initialize a SVD model object implemented in package
surprise. Some training parameters in atrainerobject will be needed, therefore asad.trainer.SVDTrainerobject is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling,self.svd_modelproperty will contain the actual model object.- Parameters
trainer (
sad.trainer.SVDTrainer) – A trainer that will call this method to initialize a SVD model.
- property k: int
The number of latent dimensions.
- load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]
Load model from a folder.
- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and configuration are stored.filename (
str) – Filename containing model parameters. The full path of the file will beos.path.join(working_dir, self.s3_key_path, filename).
- load_best(working_dir: str, criterion: str = 'll')[source]
Havn’t implemented this functionality yet.
- load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Havn’t implemented this functionality yet.
- log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]
Calculate log likelihood.
- Parameters
u_id (
str) – A user ID.i_id (
str) – An item ID. The ID of left item in preference tensor.j_id (
str) – An item ID. The ID of right item in preference tensor.obs_uij (
int) – The observation of(u_id, i_id, j_id)from dataset. Take1|-1|0three different values."1"suggests itemi_idis more preferable than itemj_idfor useru_id."-1"suggests the opposite."0"means the preference information is not available (missing data).
- Returns
Return the contribution to the log likelihood from observation of
(u_id, i_id, j_id). Return0when the observation is missing.- Return type
(
float)
- property m: int
The number of items.
- property n: int
The number of users.
- property prediction_cache: Dict[Tuple[str, str], float]
A dictionary contains the prediction cache. The key is a user id and item id pair, and value is model’s prediction.
- save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]
Save trained SVD model to a folder (
self.s3_key_path) rooted atworking_dir. The model objectself.svd_modelwill be saved as a pickle file namedmodel.picklein the folder.Model configuration (
self.config) will be saved too.- Parameters
working_dir (
str) – The containing folder ofself.s3_key_pathwhere model and its configuration will be saved.
- save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]
Haven’t implemented this functionality yet.
- property svd_model: surprise.prediction_algorithms.matrix_factorization.SVD
Singular Value Decomposition (SVD) model instance object. We are using the implementation of SVD from
surprisepackage. This model will be initialized viasad.trainer.SVDTrainerwhen calling methodself.initialize_svd_model()of this class.