sad.model package

Submodules

sad.model.base module

class ModelBase(config: Dict, task: TrainingTask = None)[source]

Bases: abc.ABC

The abstract model base class. It is the class that all concrete model classes will inherit from.

property config: Dict: Configuration information that is used to initialize the instance.

abstract load(working_dir: str, filename: str)[source]

abstract load_best(working_dir: str, criterion: str)[source]

abstract load_checkpoint(working_dir: str, checkpoint_id: int)[source]

property metrics: Dict: A dictionary stores metrics of the model. Subject to change during model training by callbacks.

abstract parameters_for_monitor() → Dict[str, float][source]

abstract predict(inputs: Any) → Any[source]

abstract reset_parameters()[source]

property s3_key_path: str: A S3 key uniquely assigned to a model instance. Will be setup during model’s instantiation, and populated to self.spec. It is the S3 key of the model’s remote store if the model will be pushed to a S3 bucket.

abstract save(working_dir: str, filename: str)[source]

abstract save_checkpoint(working_dir: str, checkpoint_id: int)[source]

property spec: Dict: A reference to "spec" field in self.config.

property task: sad.task.training.TrainingTask: An instance of training task associated with current model. It is the task instance in which a model is initialized.

property working_dir: str: Alias to self.task.output_dir.

class ModelFactory[source]

Bases: object

A factory class that is responsible to create model instances.

logger = <Logger model.ModelFactory (INFO)>

Class attribute for logging.

Type: logging.Logger

classmethod produce(config: Dict, task: TrainingTask) → sad.model.base.ModelBase[source]

A class method to create instances of sad.model.ModelBase.

Parameters

config (config) –

Configuration used to initialize instance object. An example is given below:

name: SADModel
spec:
  n: 200
  m: 500
  k: 100

classmethod register(wrapped_class: sad.model.base.ModelBase) → sad.model.base.ModelBase[source]: A class decorator responsible to decorate sad.model.ModelBase classes and register them into ModelFactory.registry.

sad.model.bpr module

class BPRModel(config: dict, task: TrainingTask = None)[source]

Bases: sad.model.base.ModelBase

calculate_preference_tensor()[source]: Calculate preference tensor self.X using user and item matrices.

calculate_probability_tensor()[source]: Calculate probability tensor by applying logistic function to preference tensor self.X.

draw_observation_tensor() → numpy.ndarray[source]

Draw a complete observation tensor from the generative model of BPR.

Returns: Three-way tensor with dimension n x m x m representing personalized preferences between item pairs.
Return type: np.ndarray

get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) → float[source]

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. 1 suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. -1 suggests the opposite. 0 means the preference information is not available (missing data).

Returns

Return d(p)/d(x_uij), the gradient of log likehood with respect to x_uij, the (u_idx, i_idx, j_idx) element in preference tensor.

Return type

(float)

get_t_sparsity() → float[source]: Extract the number of elements that are close to 1 in item right vectors self.T and return proportion.

get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) → float[source]

Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no parameter arguments provided.

Parameters

u_idx (int) – User index, from 0 to self.n-1.
i_idx (int) – Item index, from 0 to self.m-1.
j_idx (int) – Item index, from 0 to self.m-1.
XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.
H (np.ndarray) – An optional item matrix. When provided, item vector will be taken from provided H instead of self.H.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
g (float) – The gradient of log likelihood wrt x_uij.
w_l2 (float) – The weight of l2 regularization.
w_l1 (float) – The weight of l1 regularization.
lr (float) – Learning rate.

initialize_params()[source]: Initialize user matrix self.XI and item matrix self.H by drawing entries from a standard normal distribution.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model parameters.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Load model checkpoints.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.
checkpoint_id (int) – Model parameters will be loaded from file with name "model-params-{checkpoint_id:05d}.npz".

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).
XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.
H (np.ndarray) – An optional item matrix. When provided, item vector will be taken from provided H instead of self.H.

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Extract the number of elements that are close to 1 in item right vectors self.T and return proportion.

predict(inputs: Any) → Any[source]

reset_parameters()[source]

save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]: Save model parameters to a file named "model-params.npz" under os.path.join(working_dir, self.s3_key_path).

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Save model checkpoints to a file under os.path.join(working_dir, self.s3_key_path).

sad.model.cornac module

class CornacModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property cornac_model: cornac.models.recommender.Recommender: A model instance object from Cornac package. This model will be initialized via sad.trainer.CornacTrainer when calling method self.initialize_cornac_model() of this class. This is because some parameters needed to initialize a Cornac model are actually related to trainer specifications. Therefore those parameters need to be passed from trainer.

get_xuij(u_idx: int, i_idx: int, j_idx: int, **kwargs) → float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item indices are needed.

Parameters

u_idx (int) – User index, from 0 to self.n-1.
i_idx (int) – Item index, from 0 to self.m-1.
j_idx (int) – Item index, from 0 to self.m-1.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

initialize_cornac_model(trainer: CornacTrainer)[source]

Initialize a model object implemented in Cornac package. Some training parameters in a trainer object will be needed, therefore a sad.trainer.CornacTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.cornac_model property will contain the actual model object. "cornac_model_name" field in self.spec contains the class name that will be used to initialize a Cornac model instance.

Parameters: trainer (sad.trainer.CornacTrainer) – A trainer that will call this method to initialize a Cornac model.
Raises: AttributeError – When supplied "cornac_model_name" is not an existing Cornac model class in models module from Cornac package.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and some additional information are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Havn’t implemented this functionality yet.

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained Cornac model to a folder (self.s3_key_path) rooted at working_dir. The actual save operation will be delegated to self.cornac_model.save(). In the meanwhile, some additional fields defined by ADDITIONAL_FIELD_NAMES macro in this module will be serialized to pickle files in the same folder.

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and some additional information will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

sad.model.fm module

class FMModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property fm_model: rankfm.rankfm.RankFM: The Factorization Machine (FM) model instance object. We are using the implementation of FM from RankFM package. This model will be initialized via sad.trainer.FMTrainer when calling method self.initialize_fm_model() of this class. This is because some paraemters that are required to initialize a RankFM model are owned by trainer. Therefore those parameters need to be passed from the trainer.

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) → float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (not indices) are needed as arguments.

Parameters

u_id (str) – User ID.
i_id (str) – Item ID.
j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_fm_model(trainer: FMTrainer)[source]

Initialize a FM model object implemented in package RankFM. Some training parameters in a trainer object will be needed, therefore a sad.trainer.FMTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.fm_model property will contain the actual model object.

Parameters: trainer (sad.trainer.FMTrainer) – A trainer that will call this method to initialize a FM model.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_id (str) – A user ID.
i_id (str) – An item ID. The ID of left item in preference tensor.
j_id (str) – An item ID. The ID of right item in preference tensor.
obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained FM model to a folder (self.s3_key_path) rooted at working_dir. The trained FM model (self.fm_model) will be saved as a pickle file named model.pickle under the folder.

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

sad.model.msft_ncf module

class MSFTRecNCFModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) → float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters

u_id (str) – User ID.
i_id (str) – Item ID.
j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_msft_ncf_model(trainer: MSFTRecNCFTrainer)[source]

Initialize a NCF model object implemented in Python package recommenders . Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecNCFTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.msft_ncf_model property will contain the actual model object.

Parameters: trainer (sad.trainer.MSFTRecNCFTrainer) – A trainer that will call this method to initialize a NCF model object.

property k: int: The number of latent dimentions.

property layer_sizes: List[int]: The layer sizes of the MLP part of the NCF model. Its value will be read directly from "layer_sizes" field in self.spec. Default to [128], a one layer perceptron with 128 nodes.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_id (str) – A user ID.
i_id (str) – An item ID. The ID of left item in preference tensor.
j_id (str) – An item ID. The ID of right item in preference tensor.
obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property model_type: str: The type of NCF model that is supported by "recommenders" package. Currently could take "MLP|GMF|NeuMF". Read directly from "model_type" field in self.spec. Default to "NeuMF".

property msft_ncf_model: recommenders.models.ncf.ncf_singlenode.NCF: The Neural Collaborative Filtering (NCF) model instance object. We are using the implementation of NCF from recommenders package developed and maintained by Mircrosoft. This model will be initialized via sad.trainer.MSFTRecNCFTrainer when calling method self.initialize_msft_ncf_model() of this class. This is because some parameters required to initialize a NCF model are actually specified in trainer. Therefore those paraemters need to be passed from trainer to this model.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained NCF model to a folder (self.s3_key_path) rooted at working_dir. The actual saving operation will be delegated to self.msft_ncf_model.save(). In the meanwhile, some additional information about the model will be saved to additional_info.json. Those additional information will be used when loading a trained NCF model.

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

sad.model.msft_rbm module

class MSFTRecRBMModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property bh: numpy.ndarray: The bias for hidden unit. The size is 1 x k. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

property bv: numpy.ndarray: The bias for visible unit. The size is 1 x m. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) → float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters

u_id (str) – User ID.
i_id (str) – Item ID.
j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

property hidden_units: int: The the number of hidden units in the RBM model. Its value will read directly from "k" field in self.spec.

initialize_msft_rbm_model(trainer: MSFTRecRBMTrainer)[source]

Initialize a RBM model object implemented in Python package recommenders . Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecRBMTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as an argument. After calling, self.msft_rbm_model property will contain the actual model object.

Parameters: trainer (sad.trainer.MSFTRecRBMTrainer) – A trainer that will call this method to initialize a RBM model object.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder. Need tests to confirm working properly.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_id (str) – A user ID.
i_id (str) – An item ID. The ID of left item in preference tensor.
j_id (str) – An item ID. The ID of right item in preference tensor.
obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property msft_rbm_model: recommenders.models.rbm.rbm.RBM: The Restricted Boltzmann Machine (RBM) model instance object. We are using the implementation of RBM from recommenders package developed and maintained by Mircrosoft. This model will be initialized via sad.trainer.MSFTRecRBMTrainer when calling method self.initialize_msft_rbm_model() of this class. This is because some parameters that are required to initialize a RBM model are actually specified in its trainer.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained RBM model to a folder (self.s3_key_path) rooted at working_dir. The three parameters in the RBM are first converted to numpy arrays, and then saved to file weights.npz in the folder of os.path.join(self.s3_key_path, working_dir).

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

property w: numpy.ndarray: The weight in RBM model. The size is in m x k. It’s value will be initialized to zero. When loading a pre-trained MSFTRecRBMModel, its value will be loaded too.

sad.model.msft_vae module

class MSFTRecVAEModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) → float[source]: Haven’t implemented yet.

initialize_msft_vae_model(trainer: MSFTRecVAETrainer)[source]

Initialize a VAE model object implemented in package recommenders. Some training parameters in a trainer object will be needed, therefore a sad.trainer.MSFTRecVAETrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling, self.msft_vae_model property will contain the actual model object.

Parameters: trainer (sad.trainer.MSFTRecVAETrainer) – A trainer that will call this method to initialize a VAE model.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Haven’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_id (str) – A user ID.
i_id (str) – An item ID. The ID of left item in preference tensor.
j_id (str) – An item ID. The ID of right item in preference tensor.
obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property msft_vae_model: recommenders.models.vae.standard_vae.StandardVAE: Variational Auto Encoder (VAE) model instance object. We are using the implementation of VAE from recommenders package developed and maintained by MSFT. This model will be initialized via sad.trainer.VAETrainer when calling method self.initialize_msft_vae_model() of this class. This is because some parameters that are required to initialize a VAE model are actually specified in its trainer.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None)[source]

Save trained VAE model to a folder (self.s3_key_path) rooted at working_dir. The actual saving operation will be delegated to self.msft_vae_model.model.save().

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

sad.model.sad module

class SADModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

property T_ceiling: float: The largest value of T that is allowed.

calculate_preference_tensor()[source]: Calculate preference tensor self.X using user and item matrices.

calculate_probability_tensor()[source]: Calculate probability tensor by applying logistic function to preference tensor self.X.

draw_observation_tensor() → numpy.ndarray[source]

Draw a complete observation tensor from the generative model of SAD.

Returns: Three-way tensor with dimension n x m x m representing personalized preferences between item pairs.
Return type: np.ndarray

get_gradient_wrt_xuij(u_idx: int, i_idx: int, j_idx: int, obs_uij: int) → float[source]

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return d(p)/d(x_uij), the gradient of log likehood with respect to x_uij, the (u_idx, i_idx, j_idx) element in preference tensor.

Return type

(float)

get_t_sparsity() → float[source]: Extract the number of elements that are close to 1 in item right vectors self.T and return proportion. When self.inner_flag is True, it is exponentiation of self.T will be used to calculate this number.

get_xuij(u_idx: int, i_idx: int, j_idx: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) → float[source]

Calculate preference score between two items for a particular user. Parameter values in current model will be used to calculate the preference score if no additional parameters are provided as arguments.

Parameters

u_idx (int) – User index, from 0 to self.n-1.
i_idx (int) – Item index, from 0 to self.m-1.
j_idx (int) – Item index, from 0 to self.m-1.
XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.
H (np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from provided H instead of self.H.
T (np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from provided T instead of self.T. Subject to exponentiation when self.inner_flag is True.

Returns

Preference score between i_idx-th item and j_idx-th item for u_idx-th user.

Return type

float

gradient_update(u_idx: int, i_idx: int, j_idx: int, g: float, w_l2: float, w_l1: float, lr: float)[source]

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
g (float) – The gradient of log likelihood wrt x_uij.
w_l2 (float) – The weight of l2 regularization.
w_l1 (float) – The weight of l1 regularization.
lr (float) – Learning rate.

initialize_params()[source]: Initialize user matrix self.XI, left item matrix self.H and right item matrix self.T by drawing entries from a standard normal distribution. When right item matrix is assumed to be non-negative (self.inner_flag is True), self.T will be storing the logrithm of true tau matrix.

property inner_flag: bool: Whether right matrix will be non-negative.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model parameters.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]

Load model checkpoints.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model parameters are stored.
checkpoint_id (int) – Model parameters will be loaded from file with name "model-params-{checkpoint_id:05d}.npz".

log_likelihood(u_idx: int, i_idx: int, j_idx: int, obs_uij: int, XI: Optional[numpy.ndarray] = None, H: Optional[numpy.ndarray] = None, T: Optional[numpy.ndarray] = None, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_idx (int) – Index of user in user set. 0-based.
i_idx (int) – Index of i-th item. It is the idx of left item in preference tensor.
j_idx (int) – Index of j-th item. It is the idx of right item in preference tensor.
obs_uij (int) – The observation at (u_idx, i_idx, j_idx). Take 1|-1|0 three different values. "1" suggests i_idx-th item is more preferable than j_idx-th item for u_idx-th user. "-1" suggests the opposite. "0" means the preference information is not available (missing data).
XI (np.ndarray) – An optional user matrix. When provided, user vector will be taken from provided XI instead of self.XI.
H (np.ndarray) – An optional left item matrix. When provided, left item vector will be taken from provided H instead of self.H.
T (np.ndarray) – An optional right item matrix. When provided, right item vector will be taken from provided T instead of self.T. Subject to exponentiation when self.inner_flag is set to True.

Returns

Return the contribution to the log likelihood from observation at (u_idx, i_idx, j_idx). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Extract the number of elements that are close to 1 in item right vectors self.T and return proportion. When self.inner_flag is True, it is exponentiation of self.T will be used to calculate this number.

predict(inputs: Any) → Any[source]

reset_parameters()[source]

save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]: Save model parameters to a file named "model-params.npz" under os.path.join(working_dir, self.s3_key_path).

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Save model checkpoints to a file under os.path.join(working_dir, self.s3_key_path).

sad.model.svd module

class SVDModel(config: dict, task: TrainingTask)[source]

Bases: sad.model.base.ModelBase

get_xuij(u_id: str, i_id: str, j_id: str, **kwargs) → float[source]

Calculate preference score between two items for a particular user. The preference strength of an item for a user of this model class is the logit of model’s prediction probability. The difference between preference strengths of the two items from the provided user is how the preference score is calculated. For this class, user and item ids (instead of indices) are needed as arguments.

Parameters

u_id (str) – User ID.
i_id (str) – Item ID.
j_id (str) – Item ID.

Returns

Preference score between item i_id and j_id for user u_id.

Return type

float

initialize_svd_model(trainer: SVDTrainer)[source]

Initialize a SVD model object implemented in package surprise. Some training parameters in a trainer object will be needed, therefore a sad.trainer.SVDTrainer object is supplied as an argument. The trainer is supposed to call this method and supply itself as the argument. After calling, self.svd_model property will contain the actual model object.

Parameters: trainer (sad.trainer.SVDTrainer) – A trainer that will call this method to initialize a SVD model.

property k: int: The number of latent dimensions.

load(working_dir: Optional[str] = None, filename: Optional[str] = None)[source]

Load model from a folder.

Parameters

working_dir (str) – The containing folder of self.s3_key_path where model and configuration are stored.
filename (str) – Filename containing model parameters. The full path of the file will be os.path.join(working_dir, self.s3_key_path, filename).

load_best(working_dir: str, criterion: str = 'll')[source]: Havn’t implemented this functionality yet.

load_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Havn’t implemented this functionality yet.

log_likelihood(u_id: str, i_id: str, j_id: str, obs_uij: int, **kwargs) → float[source]

Calculate log likelihood.

Parameters

u_id (str) – A user ID.
i_id (str) – An item ID. The ID of left item in preference tensor.
j_id (str) – An item ID. The ID of right item in preference tensor.
obs_uij (int) – The observation of (u_id, i_id, j_id) from dataset. Take 1|-1|0 three different values. "1" suggests item i_id is more preferable than item j_id for user u_id. "-1" suggests the opposite. "0" means the preference information is not available (missing data).

Returns

Return the contribution to the log likelihood from observation of (u_id, i_id, j_id). Return 0 when the observation is missing.

Return type

(float)

property m: int: The number of items.

property n: int: The number of users.

parameters_for_monitor() → dict[source]: Return nothing.

predict(inputs: Any) → Any[source]

property prediction_cache: Dict[Tuple[str, str], float]: A dictionary contains the prediction cache. The key is a user id and item id pair, and value is model’s prediction.

reset_parameters()[source]: Doing nothing.

save(working_dir: Optional[str] = None, filename: str = 'model-params.npz')[source]

Save trained SVD model to a folder (self.s3_key_path) rooted at working_dir. The model object self.svd_model will be saved as a pickle file named model.pickle in the folder.

Model configuration (self.config) will be saved too.

Parameters: working_dir (str) – The containing folder of self.s3_key_path where model and its configuration will be saved.

save_checkpoint(working_dir: str, checkpoint_id: int = 1)[source]: Haven’t implemented this functionality yet.

property svd_model: surprise.prediction_algorithms.matrix_factorization.SVD: Singular Value Decomposition (SVD) model instance object. We are using the implementation of SVD from surprise package. This model will be initialized via sad.trainer.SVDTrainer when calling method self.initialize_svd_model() of this class.

sad.model package

Submodules

sad.model.base module

sad.model.bpr module

sad.model.cornac module

sad.model.fm module

sad.model.msft_ncf module

sad.model.msft_rbm module

sad.model.msft_vae module

sad.model.sad module

sad.model.svd module

Module contents