Pipeline Classifier#
This example creates a model which can be used to train a simple drawing or sketch classifier based on user examples. The model is a pipeline composed of a drawing-embedding model and a nearest-neighbor classifier.
The model is updatable and starts off empty, meaning that the nearest-neighbor classifier has no examples or labels. Before updating with training examples, the model predicts “unknown” for all input.
The input to the model is a 28 x 28 grayscale drawing. The background is expected to be black (0
), while the strokes of the drawing should be rendered as white (255
). Right-click these 28 x 28 images for the following example:
Drawing of a star:
Drawing of a heart:
Drawing of 5:
Get the Embedding Model#
The drawing-embedding model is used as a feature extractor. Start by getting the first part of the model, the spec:
import coremltools
from coremltools.models import MLModel
embedding_path = './models/TinyDrawingEmbedding.mlmodel'
embedding_model = MLModel(embedding_path)
embedding_spec = embedding_model.get_spec()
print embedding_spec.description
In the following output, the shortDescription
indicates that the embedding model takes in a 28 x 28 grayscale image about outputs a 128 dimensional float vector:
tf.estimator package not installed.
tf.estimator package not installed.
input {
name: "drawing"
shortDescription: "Input sketch image with black background and white strokes"
type {
imageType {
width: 28
height: 28
colorSpace: GRAYSCALE
}
}
}
output {
name: "embedding"
shortDescription: "Vector embedding of sketch in 128 dimensional space"
type {
multiArrayType {
shape: 128
dataType: FLOAT32
}
}
}
metadata {
shortDescription: "Embeds a 28 x 28 grayscale image of a sketch into 128 dimensional space. The model was created by removing the last layer of a simple convolution based neural network classifier trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)."
author: "Core ML Tools Example"
license: "MIT"
}
Create the Nearest Neighbor Classifier#
Now that the feature extractor is in place, create the second model of your pipeline model. It is a nearest-neighbor classifier operating on the embedding:
from coremltools.models.nearest_neighbors import KNearestNeighborsClassifierBuilder
import coremltools.models.datatypes as datatypes
knn_builder = KNearestNeighborsClassifierBuilder(input_name='embedding',
output_name='label',
number_of_dimensions=128,
default_class_label='unknown',
k=3,
weighting_scheme='inverse_distance',
index_type='linear')
knn_builder.author = 'Core ML Tools Example'
knn_builder.license = 'MIT'
knn_builder.description = 'Classifies 128 dimension vector based on 3 nearest neighbors'
knn_spec = knn_builder.spec
knn_spec.description.input[0].shortDescription = 'Input vector to classify'
knn_spec.description.output[0].shortDescription = 'Predicted label. Defaults to \'unknown\''
knn_spec.description.output[1].shortDescription = 'Probabilities / score for each possible label.'
# print knn_spec.description
Create an Updatable Pipeline Model#
The last step is to create the pipeline model and insert the feature extractor and the nearest-neighbor classifier. The model will be set to be updatable. Follow these steps:
Create the spec, set it to be updatable, and set the specification version:
pipeline_spec = coremltools.proto.Model_pb2.Model() pipeline_spec.specificationVersion = coremltools._MINIMUM_UPDATABLE_SPEC_VERSION pipeline_spec.isUpdatable = True
Set the inputs to the inputs from the embedding model:
# Inputs are the inputs from the embedding model pipeline_spec.description.input.extend(embedding_spec.description.input[:])
Set the outputs to the outputs from the classification model:
# Outputs are the outputs from the classification model pipeline_spec.description.output.extend(knn_spec.description.output[:]) pipeline_spec.description.predictedFeatureName = knn_spec.description.predictedFeatureName pipeline_spec.description.predictedProbabilitiesName = knn_spec.description.predictedProbabilitiesName
Set the training inputs:
# Training inputs pipeline_spec.description.trainingInput.extend([embedding_spec.description.input[0]]) pipeline_spec.description.trainingInput[0].shortDescription = 'Example sketch' pipeline_spec.description.trainingInput.extend([knn_spec.description.output[0]]) pipeline_spec.description.trainingInput[1].shortDescription = 'Associated true label of example sketch'
Provide the metadata:
# Provide metadata pipeline_spec.description.metadata.author = 'Core ML Tools' pipeline_spec.description.metadata.license = 'MIT' pipeline_spec.description.metadata.shortDescription = ('An updatable model which can be used to train a tiny 28 x 28 drawing classifier based on user examples.' ' It uses a drawing embedding trained on the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset)')
Construct the pipeline by adding the embedding and the nearest-neighbor classifier:
# Construct pipeline by adding the embedding and then the nearest neighbor classifier pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(embedding_spec) pipeline_spec.pipelineClassifier.pipeline.models.add().CopyFrom(knn_spec)
Save the updated spec:
# Save the updated spec. from coremltools.models import MLModel mlmodel = MLModel(pipeline_spec) output_path = './TinyDrawingClassifier.mlmodel' from coremltools.models.utils import save_spec mlmodel.save(output_path)