Turi Create  4.0
turi::recsys::lm_data_generator Class Reference

#include <toolkits/util/data_generators.hpp>

Public Member Functions

sframe generate (size_t n_observations, const std::string &target_column_name, size_t random_seed, double noise_sd) const
 
std::pair< sframe, sframegenerate_for_ranking (size_t n_train_samples_per_user, size_t n_test_samples_per_user, size_t random_seed, double noise_sd) const
 

Detailed Description

A simple class for generating fake linear model data for testing purposes. This uses the factorization machine model to generate the data.

The options going into this generator are as follows. These are not necessarily used by each function:

  • random_seed: Random seed for sampling the data.
  • n_factors: Number of latent factors to use in the generation.
  • noise_sd: Standard deviation of the noise associated with each response.
  • w0_sd: The standard deviation used in generating the intercept term.
  • w_sd: The standard deviation used in generating the linear terms.
  • V_sd: The standard deviation used in generating the latent factors.
  • y_mode: The sampling model. Can be "squared_error" or "logistic".

The defaults for these are given in data_generators.cpp.

Definition at line 43 of file data_generators.hpp.

Member Function Documentation

◆ generate()

sframe turi::recsys::lm_data_generator::generate ( size_t  n_observations,
const std::string &  target_column_name,
size_t  random_seed,
double  noise_sd 
) const

Fill data with the observations and responses of the linear model.

◆ generate_for_ranking()

std::pair<sframe, sframe> turi::recsys::lm_data_generator::generate_for_ranking ( size_t  n_train_samples_per_user,
size_t  n_test_samples_per_user,
size_t  random_seed,
double  noise_sd 
) const

Fill two datasets for ranking and testing the ranking. This works by building a linear model and assuming that the observations with the highest responses are those in the data set. A portion of these are split off into the test set.


The documentation for this class was generated from the following file: