Core ML Tools FAQs#
This page offers frequently asked questions (FAQs):
Core ML Tools Versions#
For an overview, see New Features. This release includes more APIs for optimizing the models to use less storage space, reduce power consumption, and reduce latency during inference. Key optimization techniques include pruning, quantization, and palettization. For details, see Optimizing Models.
For details about the release, see Release Notes.
The following are highlights of previous releases:
Added model compression utilities to compress the weights of a Core ML model, thereby reducing the space occupied by the model.
Enabled Float 16 input/output types including grayscale images and float 16 Multiarrays.
For details, see Release Notes for coremltools 6.3.
Core ML models now also use a directory format, called
.mlpackage, rather than just a protobuf file.
For details, see the following:
Major upgrade. Hightlights:
Introduced the Unified Conversion API with the
convert()method for converting models from TensorFlow 1, TensorFlow 2 (tf.keras), and PyTorch.
Introduced the Model Intermediate Language (MIL) as an internal intermediate representation (IR) for unifying the conversion pipeline, and added graph passes to this common IR. Passes that improve performance continue to be added, so we recommend that you always use the latest version of coremltools to convert your models.
For details, see the following:
As of the Core ML Tools 4 release, the
coremltools.keras.convert converter is no longer maintained, and is officially deprecated in Core ML Tools 5 . The Unified Conversion API supports conversion of
tf.keras models, using a TensorFlow 2 (TF2) backend.
If you have an older Keras.io model that uses TensorFlow 1 (TF1), we recommend exporting it as a TF1 frozen graph def (
.pb) file. You can then convert this file using the Unified Conversion API. For an example of how to export the old keras model to
.pb, see method
_save_h5_as_frozen_pb in the Troubleshooting section of the coremltools 3 Neural Network Guide.
Fixing High Numerical Error#
For a neural network, set the compute unit to CPU as described in Set the Compute Units. For example:
# neural networks model = ct.convert(source_model, compute_units=ct.ComputeUnit.CPU_ONLY) # or when loading the model model = ct.models.MLModel("model.mlmodel", compute_units=ct.ComputeUnit.CPU_ONLY) # now when prediction is called on this model, it will use the # higher precision Float32 CPU path for execution. # to check the compute unit of an already loaded model, # simply check the property model.compute_unit
For an ML program, set
compute_precision to Float 32 as described in Set the ML Program Precision. For example:
# ml programs # provide a higher compute precision during conversion model = ct.convert(source_model, compute_precision=ct.precision.FLOAT32)
For more information, see Typed Execution.
Image Preprocessing for Converting torchvision#
Preprocessing parameters differ between torchvision and Core ML Tools but can be easily translated, as described in Add Image Preprocessing Options. For example, you can set the scale and bias for an
ImageType, which corresponds to the torchvision parameters:
scale = 1/(0.226*255.0) bias = [- 0.485/(0.229) , - 0.456/(0.224), - 0.406/(0.225)] image_input = ct.ImageType(shape=example_input.shape, scale=scale, bias=bias)
Error in Declaring Network or Computing NN Outputs#
File an issue at the
coremltools Github repository by following the instructions in Issues and Queries. As a workaround, try using
CPUOnly compute units during conversion, as described in Set the Compute Units. For example:
import coremltools as ct model = ct.convert(tf_model, compute_units=ct.ComputeUnit.CPU_ONLY) # or if loading a pre converted model model = ct.models.MLModel("model.mlmodel", compute_units=ct.ComputeUnit.CPU_ONLY)
Starting a Deep Learning Core ML Model#
You can define a Core ML model directly by building it with the MIL builder API. This API is similar to the
torch.nn or the
tf.keras API for model construction. For an example, see Create a MIL Program.
Handling an Unsupported Op#
As a workaround, you may want to write a translation function from the missing op to the existing MIL ops. For examples, see Composite Operators.
Choosing Custom Names for Input and Outputs#
ct.convert(), the input names and output names are automatically picked up by the converter from the source model. After conversion you can see these names by doing one of the following:
Getting the model spec object and printing the names, as shown in the following example:
model = ct.models.MLModel('MyModel.mlmodel') spec = model.get_spec() # get input names input_names = [inp.name for inp in spec.description.input] # get output names output_names = [out.name for out in spec.description.output]
You can update these names by using the
Neural Engine With Flexible Input Shapes#
When converting a fixed-shape model that already runs on the Neural Engine (NE) to use flexible inputs, you should specify a flexible input shape with a set of predetermined shapes using
EnumeratedShapes. The converted model will run on the NE, unless the conversion introduces dynamic layers not supported on the NE, such as converting a static reshape to a fully dynamic reshape.
EnumeratedShapes the model can be optimized for the finite set of input shapes on the device during compilation. You can provide up to 128 different shapes. If you need more flexibility for inputs, consider setting the range for each dimension.
For details and examples of using flexible input shapes, see Flexible Input Shapes.
optimize.torch is better than PyTorch’s default quantization#
You can use PyTorch’s quantization APIs directly, and then convert the model to Core ML. However, the converted model performance may not be optimal. The PyTorch API default settings (symmetric asymmetric quantization modes and which ops are quantized) are not optimal for the Core ML stack and Apple hardware. If you use the Core ML Tools
coremltools.optimize.torch APIs, as described in Training-Time Quantization, the correct default settings are applied automatically.
Use a compiled model for faster initialization#
If your model initialization in Python takes a long time, use a compiled Core ML model (CompiledMLModel) rather than MLModel for making predictions. For large models, using a compiled model can save considerable time in initializing the model. For details, see Using Compiled Python Models for Prediction.