coremltools.converters.sklearn.convert

coremltools.converters.sklearn.convert(sk_obj, input_features=None, output_feature_names=None)

Convert scikit-learn pipeline, classifier, or regressor to Core ML format.

Parameters:
sk_obj: model | [model] of scikit-learn format.

Scikit learn model(s) to convert to a Core ML format.

The input model may be a single scikit learn model, a scikit learn pipeline model, or a list of scikit learn models.

Currently supported scikit learn models are:

  • Linear and Logistic Regression
  • LinearSVC and LinearSVR
  • SVC and SVR
  • NuSVC and NuSVR
  • Gradient Boosting Classifier and Regressor
  • Decision Tree Classifier and Regressor
  • Random Forest Classifier and Regressor
  • Normalizer
  • Imputer
  • Standard Scaler
  • DictVectorizer
  • One Hot Encoder
  • KNeighborsClassifier

The input model, or the last model in a pipeline or list of models, determines whether this is exposed as a Transformer, Regressor, or Classifier.

Note that there may not be a one-to-one correspondence between scikit learn models and which Core ML models are used to represent them. For example, many scikit learn models are embedded in a pipeline to handle processing of input features.

input_features: str | dict | list

Optional name(s) that can be given to the inputs of the scikit-learn model. Defaults to ‘input’.

Input features can be specified in a number of forms.

  • Single string: In this case, the input is assumed to be a single array, with the number of dimensions set using num_dimensions.

  • List of strings: In this case, the overall input dimensions to the scikit-learn model is assumed to be the length of the list. If neighboring names are identical, they are assumed to be an input array of that length. For example:

    [“a”, “b”, “c”]

    resolves to

    [(“a”, Double), (“b”, Double), (“c”, Double)].

    And:

    [“a”, “a”, “b”]

    resolves to

    [(“a”, Array(2)), (“b”, Double)].

  • Dictionary: Where the keys are the names and the indices or ranges of feature indices.

    In this case, it’s presented as a mapping from keys to indices or ranges of contiguous indices. For example,

    {“a” : 0, “b” : [2,3], “c” : 1}

    Resolves to

    [(“a”, Double), (“c”, Double), (“b”, Array(2))].

    Note that the ordering is determined by the indices.

  • List of tuples of the form (name, datatype). Here, name is the name of the exposed feature, and datatype is an instance of String, Double, Int64, Array, or Dictionary.

output_feature_names: string or list of strings

Optional name(s) that can be given to the inputs of the scikit-learn model.

The output_feature_names is interpreted according to the model type:

  • If the scikit-learn model is a transformer, it is the name of the array feature output by the final sequence of the transformer (defaults to “output”).
  • If it is a classifier, it should be a 2-tuple of names giving the top class prediction and the array of scores for each class (defaults to “classLabel” and “classScores”).
  • If it is a regressor, it should give the name of the prediction value (defaults to “prediction”).
Returns:
model:MLModel

Returns an MLModel instance representing a Core ML model.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> import pandas as pd

# Load data
>>> data = pd.read_csv('houses.csv')

# Train a model
>>> model = LinearRegression()
>>> model.fit(data[["bedroom", "bath", "size"]], data["price"])

 # Convert and save the scikit-learn model
>>> import coremltools
>>> coreml_model = coremltools.converters.sklearn.convert(model,
                                                         ["bedroom", "bath", "size"],
                                                         "price")
>>> coreml_model.save('HousePricer.mlmodel')