Pipeline Classifier#

This example creates a model which can be used to train a simple drawing or sketch classifier based on user examples. The model is a pipeline composed of a drawing-embedding model and a nearest-neighbor classifier.

The model is updatable and starts off empty, meaning that the nearest-neighbor classifier has no examples or labels. Before updating with training examples, the model predicts “unknown” for all input.

The input to the model is a 28 x 28 grayscale drawing. The background is expected to be black (0), while the strokes of the drawing should be rendered as white (255). Right-click these 28 x 28 images for the following example:

Drawing of a star:
Drawing of a heart:
Drawing of 5:

Get the Embedding Model#

The drawing-embedding model is used as a feature extractor. Start by getting the first part of the model, the spec:

import coremltools
from coremltools.models import MLModel

embedding_path = './models/TinyDrawingEmbedding.mlmodel'
embedding_model = MLModel(embedding_path)

embedding_spec = embedding_model.get_spec()
print embedding_spec.description

In the following output, the shortDescription indicates that the embedding model takes in a 28 x 28 grayscale image about outputs a 128 dimensional float vector:

tf.estimator package not installed.
tf.estimator package not installed.
input {
  name: "drawing"
  shortDescription: "Input sketch image with black background and white strokes"
  type {
    imageType {
      width: 28
      height: 28
      colorSpace: GRAYSCALE
    }
  }
}
output {
  name: "embedding"
  shortDescription: "Vector embedding of sketch in 128 dimensional space"
  type {
    multiArrayType {
      shape: 128
      dataType: FLOAT32
    }
  }
}
metadata {
  shortDescription: "Embeds a 28 x 28 grayscale image of a sketch into 128 dimensional space. The model was created by removing the last layer of a simple convolution based neural network classifier trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)."
  author: "Core ML Tools Example"
  license: "MIT"
}

Create the Nearest Neighbor Classifier#

Now that the feature extractor is in place, create the second model of your pipeline model. It is a nearest-neighbor classifier operating on the embedding:

from coremltools.models.nearest_neighbors import KNearestNeighborsClassifierBuilder
import coremltools.models.datatypes as datatypes

knn_builder = KNearestNeighborsClassifierBuilder(input_name='embedding',
                                                 output_name='label',
                                                 number_of_dimensions=128,
                                                 default_class_label='unknown',
                                                 k=3,
                                                 weighting_scheme='inverse_distance',
                                                 index_type='linear')

knn_builder.author = 'Core ML Tools Example'
knn_builder.license = 'MIT'
knn_builder.description = 'Classifies 128 dimension vector based on 3 nearest neighbors'

knn_spec = knn_builder.spec
knn_spec.description.input[0].shortDescription = 'Input vector to classify'
knn_spec.description.output[0].shortDescription = 'Predicted label. Defaults to \'unknown\''
knn_spec.description.output[1].shortDescription = 'Probabilities / score for each possible label.'

# print knn_spec.description

Create an Updatable Pipeline Model#

The last step is to create the pipeline model and insert the feature extractor and the nearest-neighbor classifier. The model will be set to be updatable. Follow these steps:

Create the spec, set it to be updatable, and set the specification version:

pipeline_spec = coremltools.proto.Model_pb2.Model()
pipeline_spec.specificationVersion = coremltools._MINIMUM_UPDATABLE_SPEC_VERSION
pipeline_spec.isUpdatable = True

Set the inputs to the inputs from the embedding model:

# Inputs are the inputs from the embedding model
pipeline_spec.description.input.extend(embedding_spec.description.input[:])

Set the outputs to the outputs from the classification model:

# Outputs are the outputs from the classification model
pipeline_spec.description.output.extend(knn_spec.description.output[:])
pipeline_spec.description.predictedFeatureName = knn_spec.description.predictedFeatureName
pipeline_spec.description.predictedProbabilitiesName = knn_spec.description.predictedProbabilitiesName

Set the training inputs:

# Training inputs
pipeline_spec.description.trainingInput.extend([embedding_spec.description.input[0]])
pipeline_spec.description.trainingInput[0].shortDescription = 'Example sketch'
pipeline_spec.description.trainingInput.extend([knn_spec.description.output[0]])
pipeline_spec.description.trainingInput[1].shortDescription = 'Associated true label of example sketch'

Provide the metadata:

# Provide metadata
pipeline_spec.description.metadata.author = 'Core ML Tools'
pipeline_spec.description.metadata.license = 'MIT'
pipeline_spec.description.metadata.shortDescription = ('An updatable model which can be used to train a tiny 28 x 28 drawing classifier based on user examples.'
													   ' It uses a drawing embedding trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)')

Construct the pipeline by adding the embedding and the nearest-neighbor classifier:

# Construct pipeline by adding the embedding and then the nearest neighbor classifier
pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(embedding_spec)
pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(knn_spec)

Save the updated spec:

# Save the updated spec.
from coremltools.models import MLModel
mlmodel = MLModel(pipeline_spec)

output_path = './TinyDrawingClassifier.mlmodel'
from coremltools.models.utils import save_spec
mlmodel.save(output_path)

Pipeline Classifier

Contents

Pipeline Classifier#

Get the Embedding Model#

Create the Nearest Neighbor Classifier#

Create an Updatable Pipeline Model#