sad.model package

Submodules

sad.model.base module

class ModelBase(config: Dict, task: TrainingTask = None)[source]

Bases: abc.ABC

The abstract model base class. It is the class that all concrete model classes will inherit from.

property config: Dict

Configuration information that is used to initialize the instance.

abstract load(working_dir: str, filename: str)[source]
abstract load_best(working_dir: str, criterion: str)[source]
abstract load_checkpoint(working_dir: str, checkpoint_id: int)[source]
property metrics: Dict

A dictionary stores metrics of the model. Subject to change during model training by callbacks.

abstract parameters_for_monitor() Dict[str, float][source]
abstract predict(inputs: Any) Any[source]
abstract reset_parameters()[source]
property s3_key_path: str

A S3 key uniquely assigned to a model instance. Will be setup during model’s instantiation, and populated to self.spec. It is the S3 key of the model’s remote store if the model will be pushed to a S3 bucket.

abstract save(working_dir: str, filename: str)[source]
abstract save_checkpoint(working_dir: str, checkpoint_id: int)[source]
property spec: Dict

A reference to "spec" field in self.config.

property task: sad.task.training.TrainingTask

An instance of training task associated with current model. It is the task instance in which a model is initialized.

property working_dir: str

Alias to self.task.output_dir.

class ModelFactory[source]

Bases: object

A factory class that is responsible to create model instances.

logger = <Logger model.ModelFactory (INFO)>

Class attribute for logging.

Type

logging.Logger

classmethod produce(config: Dict, task: TrainingTask) sad.model.base.ModelBase[source]

A class method to create instances of sad.model.ModelBase.

Parameters

config (config) –

Configuration used to initialize instance object. An example is given below:

name: SADModel
spec:
  n: 200
  m: 500
  k: 100

classmethod register(wrapped_class: sad.model.base.ModelBase) sad.model.base.ModelBase[source]

A class decorator responsible to decorate sad.model.ModelBase classes and register them into ModelFactory.registry.

sad.model.bpr module

class BPRModel(config: dict, task: TrainingTask = None)[source]

Bases: sad.model.base.ModelBase

calculate_preference_tensor()[source]

Calculate preference tensor self.X using user and item matrices.

calculate_probability_tensor()[source]

Calculate probability tensor by applying logistic function to preference tensor self.X.

draw_observation_tensor() numpy.ndarray[source]

Draw a complete observation tensor from the generative model of BPR.

Returns

Three-way tensor with dimension n x m x m representing personalized preferences between item pairs.

Return type

np.ndarray

get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float[source]
Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. 1 suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. -1 suggests the opposite. 0 means the preference information is not available (missing data).

Returns

Return d(p)/d(x_uij), the gradient of log likehood with respect to x_uij, the (u_idx, i_idx, j_idx) element in preference tensor.

Return type

(float)

get_t_sparsity() float[source]

Extract the number of elements that are close to 1 in item right vectors self.T and return proportion.

get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float[source]

Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no parameter arguments provided.

Parameters
  • u_idx (int) – User index, from 0 to self.n-1.

  • i_idx (int) – Item index, from 0 to self.m-1.

  • j_idx (int) – Item index, from 0 to self.m-1.

  • XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.

  • H (np.ndarray) – An optional item matrix. When provided, item vector will be taken from provided H instead of self.H.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • g (float) – The gradient of log likelihood wrt x_uij.

  • w_l2 (float) – The weight of l2 regularization.

  • w_l1 (float) – The weight of l1 regularization.

  • lr (float) – Learning rate.

initialize_params()[source]

Initialize user matrix self.XI and item matrix self.H by drawing entries from a standard normal distribution.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model parameters.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]
load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Load model checkpoints.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.

  • checkpoint_id (int) – Model parameters will be loaded from file with name "model-params-{checkpoint_id:05d}.npz".

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

  • XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.

  • H (np.ndarray) – An optional item matrix. When provided, item vector will be taken from provided H instead of self.H.

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Extract the number of elements that are close to 1 in item right vectors self.T and return proportion.

predict(inputs: Any) Any[source]
reset_parameters()[source]
save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]

Save model parameters to a file named "model-params.npz" under os.path.join(working_dir, self.s3_key_path).

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Save model checkpoints to a file under os.path.join(working_dir, self.s3_key_path).

sad.model.cornac module

class CornacModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property cornac_model: cornac.models.recommender.Recommender

A model instance object from Cornac package. This model will be initialized via sad.trainer.CornacTrainer when calling method self.initialize_cornac_model() of this class. This is because some parameters needed to initialize a Cornac model are actually related to trainer specifications. Therefore those parameters need to be passed from trainer.

get_xuij(u_idx: int, i_idx: int, j_idx: int, **kwargs) float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item indices are needed.

Parameters
  • u_idx (int) – User index, from 0 to self.n-1.

  • i_idx (int) – Item index, from 0 to self.m-1.

  • j_idx (int) – Item index, from 0 to self.m-1.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

initialize_cornac_model(trainer: CornacTrainer)[source]

Initialize a model object implemented in Cornac package. Some training parameters in a trainer object will be needed, therefore a sad.trainer.CornacTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.cornac_model property will contain the actual model object. "cornac_model_name" field in self.spec contains the class name that will be used to initialize a Cornac model instance.

Parameters

trainer (sad.trainer.CornacTrainer) – A trainer that will call this method to initialize a Cornac model.

Raises

AttributeError – When supplied "cornac_model_name" is not an existing Cornac model class in models module from Cornac package.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and some additional information are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Havn’t implemented this functionality yet.

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained Cornac model to a folder (self.s3_key_path) rooted at working_dir. The actual save operation will be delegated to self.cornac_model.save(). In the meanwhile, some additional fields defined by ADDITIONAL_FIELD_NAMES macro in this module will be serialized to pickle files in the same folder.

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and some additional information will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

sad.model.fm module

class FMModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property fm_model: rankfm.rankfm.RankFM

The Factorization Machine (FM) model instance object. We are using the implementation of FM from RankFM package. This model will be initialized via sad.trainer.FMTrainer when calling method self.initialize_fm_model() of this class. This is because some paraemters that are required to initialize a RankFM model are owned by trainer. Therefore those parameters need to be passed from the trainer.

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (not indices) are needed as arguments.

Parameters
  • u_id (str) – User ID.

  • i_id (str) – Item ID.

  • j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_fm_model(trainer: FMTrainer)[source]

Initialize a FM model object implemented in package RankFM. Some training parameters in a trainer object will be needed, therefore a sad.trainer.FMTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.fm_model property will contain the actual model object.

Parameters

trainer (sad.trainer.FMTrainer) – A trainer that will call this method to initialize a FM model.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_id (str) – A user ID.

  • i_id (str) – An item ID. The ID of left item in preference tensor.

  • j_id (str) – An item ID. The ID of right item in preference tensor.

  • obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained FM model to a folder (self.s3_key_path) rooted at working_dir. The trained FM model (self.fm_model) will be saved as a pickle file named model.pickle under the folder.

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

sad.model.msft_ncf module

class MSFTRecNCFModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters
  • u_id (str) – User ID.

  • i_id (str) – Item ID.

  • j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_msft_ncf_model(trainer: MSFTRecNCFTrainer)[source]

Initialize a NCF model object implemented in Python package recommenders . Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecNCFTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.msft_ncf_model property will contain the actual model object.

Parameters

trainer (sad.trainer.MSFTRecNCFTrainer) – A trainer that will call this method to initialize a NCF model object.

property k: int

The number of latent dimentions.

property layer_sizes: List[int]

The layer sizes of the MLP part of the NCF model. Its value will be read directly from "layer_sizes" field in self.spec. Default to [128], a one layer perceptron with 128 nodes.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_id (str) – A user ID.

  • i_id (str) – An item ID. The ID of left item in preference tensor.

  • j_id (str) – An item ID. The ID of right item in preference tensor.

  • obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property model_type: str

The type of NCF model that is supported by "recommenders" package. Currently could take "MLP|GMF|NeuMF". Read directly from "model_type" field in self.spec. Default to "NeuMF".

property msft_ncf_model: recommenders.models.ncf.ncf_singlenode.NCF

The Neural Collaborative Filtering (NCF) model instance object. We are using the implementation of NCF from recommenders package developed and maintained by Mircrosoft. This model will be initialized via sad.trainer.MSFTRecNCFTrainer when calling method self.initialize_msft_ncf_model() of this class. This is because some parameters required to initialize a NCF model are actually specified in trainer. Therefore those paraemters need to be passed from trainer to this model.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained NCF model to a folder (self.s3_key_path) rooted at working_dir. The actual saving operation will be delegated to self.msft_ncf_model.save(). In the meanwhile, some additional information about the model will be saved to additional_info.json. Those additional information will be used when loading a trained NCF model.

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

sad.model.msft_rbm module

class MSFTRecRBMModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property bh: numpy.ndarray

The bias for hidden unit. The size is 1 x k. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

property bv: numpy.ndarray

The bias for visible unit. The size is 1 x m. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters
  • u_id (str) – User ID.

  • i_id (str) – Item ID.

  • j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

property hidden_units: int

The the number of hidden units in the RBM model. Its value will read directly from "k" field in self.spec.

initialize_msft_rbm_model(trainer: MSFTRecRBMTrainer)[source]

Initialize a RBM model object implemented in Python package recommenders . Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecRBMTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.msft_rbm_model property will contain the actual model object.

Parameters

trainer (sad.trainer.MSFTRecRBMTrainer) – A trainer that will call this method to initialize a RBM model object.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder. Need tests to confirm working properly.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_id (str) – A user ID.

  • i_id (str) – An item ID. The ID of left item in preference tensor.

  • j_id (str) – An item ID. The ID of right item in preference tensor.

  • obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property msft_rbm_model: recommenders.models.rbm.rbm.RBM

The Restricted Boltzmann Machine (RBM) model instance object. We are using the implementation of RBM from recommenders package developed and maintained by Mircrosoft. This model will be initialized via sad.trainer.MSFTRecRBMTrainer when calling method self.initialize_msft_rbm_model() of this class. This is because some parameters that are required to initialize a RBM model are actually specified in its trainer.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained RBM model to a folder (self.s3_key_path) rooted at working_dir. The three parameters in the RBM are first converted to numpy arrays, and then saved to file weights.npz in the folder of os.path.join(self.s3_key_path, working_dir).

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

property w: numpy.ndarray

The weight in RBM model. The size is in m x k. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

sad.model.msft_vae module

class MSFTRecVAEModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]

Haven’t implemented yet.

initialize_msft_vae_model(trainer: MSFTRecVAETrainer)[source]

Initialize a VAE model object implemented in package recommenders. Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecVAETrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling, self.msft_vae_model property will contain the actual model object.

Parameters

trainer (sad.trainer.MSFTRecVAETrainer) – A trainer that will call this method to initialize a VAE model.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Haven’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_id (str) – A user ID.

  • i_id (str) – An item ID. The ID of left item in preference tensor.

  • j_id (str) – An item ID. The ID of right item in preference tensor.

  • obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property msft_vae_model: recommenders.models.vae.standard_vae.StandardVAE

Variational Auto Encoder (VAE) model instance object. We are using the implementation of VAE from recommenders package developed and maintained by MSFT. This model will be initialized via sad.trainer.VAETrainer when calling method self.initialize_msft_vae_model() of this class. This is because some parameters that are required to initialize a VAE model are actually specified in its trainer.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained VAE model to a folder (self.s3_key_path) rooted at working_dir. The actual saving operation will be delegated to self.msft_vae_model.model.save().

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

sad.model.sad module

class SADModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property T_ceiling: float

The largest value of T that is allowed.

calculate_preference_tensor()[source]

Calculate preference tensor self.X using user and item matrices.

calculate_probability_tensor()[source]

Calculate probability tensor by applying logistic function to preference tensor self.X.

draw_observation_tensor() numpy.ndarray[source]

Draw a complete observation tensor from the generative model of SAD.

Returns

Three-way tensor with dimension n x m x m representing personalized preferences between item pairs.

Return type

np.ndarray

get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) float[source]
Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return d(p)/d(x_uij), the gradient of log likehood with respect to x_uij, the (u_idx, i_idx, j_idx) element in preference tensor.

Return type

(float)

get_t_sparsity() float[source]

Extract the number of elements that are close to 1 in item right vectors self.T and return proportion. When self.inner_flag is True, it is exponentiation of self.T will be used to calculate this number.

get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float[source]

Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no additional parameters are provided as arguments.

Parameters
  • u_idx (int) – User index, from 0 to self.n-1.

  • i_idx (int) – Item index, from 0 to self.m-1.

  • j_idx (int) – Item index, from 0 to self.m-1.

  • XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.

  • H (np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from provided H instead of self.H.

  • T (np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from provided T instead of self.T. Subject to exponentiation when self.inner_flag is True.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]
Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • g (float) – The gradient of log likelihood wrt x_uij.

  • w_l2 (float) – The weight of l2 regularization.

  • w_l1 (float) – The weight of l1 regularization.

  • lr (float) – Learning rate.

initialize_params()[source]

Initialize user matrix self.XI, left item matrix self.H and right item matrix self.T by drawing entries from a standard normal distribution. When right item matrix is assumed to be non-negative (self.inner_flag is True), self.T will be storing the logrithm of true tau matrix.

property inner_flag: bool

Whether right matrix will be non-negative.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model parameters.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]
load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Load model checkpoints.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.

  • checkpoint_id (int) – Model parameters will be loaded from file with name "model-params-{checkpoint_id:05d}.npz".

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_idx (int) – Index of user in user set. 0-based.

  • i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.

  • j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.

  • obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

  • XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.

  • H (np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from provided H instead of self.H.

  • T (np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from provided T instead of self.T. Subject to exponentiation when self.inner_flag is set to True.

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Extract the number of elements that are close to 1 in item right vectors self.T and return proportion. When self.inner_flag is True, it is exponentiation of self.T will be used to calculate this number.

predict(inputs: Any) Any[source]
reset_parameters()[source]
save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]

Save model parameters to a file named "model-params.npz" under os.path.join(working_dir, self.s3_key_path).

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Save model checkpoints to a file under os.path.join(working_dir, self.s3_key_path).

sad.model.svd module

class SVDModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters
  • u_id (str) – User ID.

  • i_id (str) – Item ID.

  • j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_svd_model(trainer: SVDTrainer)[source]

Initialize a SVD model object implemented in package surprise. Some training parameters in a trainer object will be needed, therefore a sad.trainer.SVDTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling, self.svd_model property will contain the actual model object.

Parameters

trainer (sad.trainer.SVDTrainer) – A trainer that will call this method to initialize a SVD model.

property k: int

The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters
  • working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.

  • filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) float[source]

Calculate log likelihood.

Parameters
  • u_id (str) – A user ID.

  • i_id (str) – An item ID. The ID of left item in preference tensor.

  • j_id (str) – An item ID. The ID of right item in preference tensor.

  • obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int

The number of items.

property n: int

The number of users.

parameters_for_monitor() dict[source]

Return nothing.

predict(inputs: Any) Any[source]
property prediction_cache: Dict[Tuple[str, str], float]

A dictionary contains the prediction cache. The key is a user id and item id pair, and value is model’s prediction.

reset_parameters()[source]

Doing nothing.

save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]

Save trained SVD model to a folder (self.s3_key_path) rooted at working_dir. The model object self.svd_model will be saved as a pickle file named model.pickle in the folder.

Model configuration (self.config) will be saved too.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Haven’t implemented this functionality yet.

property svd_model: surprise.prediction_algorithms.matrix_factorization.SVD

Singular Value Decomposition (SVD) model instance object. We are using the implementation of SVD from surprise package. This model will be initialized via sad.trainer.SVDTrainer when calling method self.initialize_svd_model() of this class.

Module contents