turicreate.image_classifier.create¶
-
turicreate.image_classifier.
create
(dataset, target, feature=None, model='resnet-50', l2_penalty=0.01, l1_penalty=0.0, solver='auto', feature_rescaling=True, convergence_threshold=0.01, step_size=1.0, lbfgs_memory_level=11, max_iterations=10, class_weights=None, validation_set='auto', verbose=True, seed=None, batch_size=64)¶ Create a
ImageClassifier
model.Parameters: - dataset : SFrame
Input data. The column named by the ‘feature’ parameter will be extracted for modeling.
- target : string, or int
Name of the column containing the target variable. The values in this column must be of string or integer type. String target variables are automatically mapped to integers in the order in which they are provided. For example, a target variable with ‘cat’ and ‘dog’ as possible values is mapped to 0 and 1 respectively with 0 being the base class and 1 being the reference class. Use model.classes to retrieve the order in which the classes are mapped.
- feature : string, optional
indicates that the SFrame has either column of Image type or array type (extracted features) and that will be the name of the column containing the input images or features. ‘None’ (the default) indicates that only feature column or the only image column in dataset should be used as the feature.
- l2_penalty : float, optional
Weight on l2 regularization of the model. The larger this weight, the more the model coefficients shrink toward 0. This introduces bias into the model but decreases variance, potentially leading to better predictions. The default value is 0.01; setting this parameter to 0 corresponds to unregularized logistic regression. See the ridge regression reference for more detail.
- l1_penalty : float, optional
Weight on l1 regularization of the model. Like the l2 penalty, the higher the l1 penalty, the more the estimated coefficients shrink toward 0. The l1 penalty, however, completely zeros out sufficiently small coefficients, automatically indicating features that are not useful for the model. The default weight of 0 prevents any features from being discarded. See the LASSO regression reference for more detail.
- solver : string, optional
Name of the solver to be used to solve the regression. See the references for more detail on each solver. Available solvers are:
- auto (default): automatically chooses the best solver for the data and model parameters.
- newton: Newton-Raphson
- lbfgs: limited memory BFGS
- fista: accelerated gradient descent
For this model, the Newton-Raphson method is equivalent to the iteratively re-weighted least squares algorithm. If the l1_penalty is greater than 0, use the ‘fista’ solver.
The model is trained using a carefully engineered collection of methods that are automatically picked based on the input data. The
newton
method works best for datasets with plenty of examples and few features (long datasets). Limited memory BFGS (lbfgs
) is a robust solver for wide datasets (i.e datasets with many coefficients).fista
is the default solver for l1-regularized linear regression. The solvers are all automatically tuned and the default options should function well. See the solver options guide for setting additional parameters for each of the solvers.See the user guide for additional details on how the solver is chosen. (see here)
- feature_rescaling : boolean, optional
Feature rescaling is an important pre-processing step that ensures that all features are on the same scale. An l2-norm rescaling is performed to make sure that all features are of the same norm. Categorical features are also rescaled by rescaling the dummy variables that are used to represent them. The coefficients are returned in original scale of the problem. This process is particularly useful when features vary widely in their ranges.
- convergence_threshold : float, optional
Convergence is tested using variation in the training objective. The variation in the training objective is calculated using the difference between the objective values between two steps. Consider reducing this below the default value (0.01) for a more accurately trained model. Beware of overfitting (i.e a model that works well only on the training data) if this parameter is set to a very low value.
- lbfgs_memory_level : float, optional
The L-BFGS algorithm keeps track of gradient information from the previous
lbfgs_memory_level
iterations. The storage requirement for each of these gradients is thenum_coefficients
in the problem. Increasing thelbfgs_memory_level ``can help improve the quality of the model trained. Setting this to more than ``max_iterations
has the same effect as setting it tomax_iterations
.- model : string optional
Uses a pretrained model to bootstrap an image classifier:
- “resnet-50” : Uses a pretrained resnet model.
- Exported Core ML model will be ~90M.
- “squeezenet_v1.1” : Uses a pretrained squeezenet model.
- Exported Core ML model will be ~4.7M.
- “VisionFeaturePrint_Scene”: Uses an OS internal feature extractor.
- Only on available on iOS 12.0+, macOS 10.14+ and tvOS 12.0+. Exported Core ML model will be ~41K.
Models are downloaded from the internet if not available locally. Once downloaded, the models are cached for future use.
- step_size : float, optional
The starting step size to use for the
fista
solver. The default is set to 1.0, this is an aggressive setting. If the first iteration takes a considerable amount of time, reducing this parameter may speed up model training.- class_weights : {dict, auto}, optional
Weights the examples in the training data according to the given class weights. If set to None, all classes are supposed to have weight one. The auto mode set the class weight to be inversely proportional to number of examples in the training data with the given class.
- validation_set : SFrame, optional
A dataset for monitoring the model’s generalization performance. The format of this SFrame must be the same as the training set. By default this argument is set to ‘auto’ and a validation set is automatically sampled and used for progress printing. If validation_set is set to None, then no additional metrics are computed. The default value is ‘auto’.
- max_iterations : int, optional
The maximum number of allowed passes through the data. More passes over the data can result in a more accurately trained model. Consider increasing this (the default value is 10) if the training accuracy is low and the Grad-Norm in the display is large.
- verbose : bool, optional
If True, prints progress updates and model details.
- seed : int, optional
Seed for random number generation. Set this value to ensure that the same model is created every time.
- batch_size : int, optional
If you are getting memory errors, try decreasing this value. If you have a powerful computer, increasing this value may improve performance.
Returns: - out : ImageClassifier
A trained
ImageClassifier
model.
See also
Examples
>>> model = turicreate.image_classifier.create(data, target='is_expensive') # Make predictions (in various forms) >>> predictions = model.predict(data) # predictions >>> predictions = model.classify(data) # predictions with confidence >>> predictions = model.predict_topk(data) # Top-5 predictions (multiclass) # Evaluate the model with ground truth data >>> results = model.evaluate(data)