
Utilities to compress Neural Network Models. Only available in coremltools 2.0b1 and onwards


class coremltools.models.neural_network.quantization_utils.AdvancedQuantizedLayerSelector(skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)

Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.


from coremltools.models.neural_network.quantization_utils import AdvancedQuantizedLayerSelector
selector = AdvancedQuantizedLayerSelector(
        skip_layer_types=['batchnorm', 'bias', 'depthwiseConv'],
quantized_model = quantize_weights(model, 8, selector=selector)
__init__(self, skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)

do_quantize(self, layer, weight_param=None)

weight_param - should be name of the WeightParam field

class coremltools.models.neural_network.quantization_utils.ModelMetrics(spec)

A utility class to hold evaluation metrics

__init__(self, spec)
class coremltools.models.neural_network.quantization_utils.OutputMetric(name, type)

Utility class to calculate and hold metrics between two model outputs

__init__(self, name, type)
class coremltools.models.neural_network.quantization_utils.QuantizedLayerSelector

This is the base class to implement custom selectors to skip certain layers during quantization. To implement a custom selector, create a class that inherits this class and override do_quantize() method.


class MyLayerSelector(QuantizedLayerSelector):
    def __init__(self):
        super(MyLayerSelector, self).__init__()

    def do_quantize(self, layer, **kwargs):
        ret = super(MyLayerSelector, self).do_quantize(layer)
        if not ret or == 'dense_2':
            return False
        return True

selector = MyLayerSelector()
quantized_model = quantize_weights(mlmodel, 8, quantization_mode='linear', selector=selector)

coremltools.models.neural_network.quantization_utils.compare_models(full_precision_model, quantized_model, sample_data)

Utility function to compare the performance of a full precision vs quantized model

full_precision_model: MLModel
The full precision model with float32 weights
quantized_model: MLModel
Quantized version of the model with quantized weights
sample_data: str | [dict]
Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
Returns:None. Performance metrics are printed out
coremltools.models.neural_network.quantization_utils.quantize_weights(full_precision_model, nbits, quantization_mode='linear', sample_data=None, **kwargs)

Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).

full_precision_model: MLModel
Model which will be converted to half precision. Currently conversion for only neural network models is supported. If a pipeline model is passed in then all embedded neural network models embedded within will be converted.
nbits: int
Number of bits per quantized weight. Only 16-bit float point and
1-8 bit is supported
quantization_mode: str

One of the following:

Linear quantization with scale and bias assuming the range of weight values is [A, B], where A = min(weight), B = max(weight)
Simple linear quantization represented as a lookup table
LUT based quantization, where LUT is generated by K-Means clustering
LUT quantization where LUT and quantized weight params are calculated using a custom function. If this mode is selected then a custom function must be passed in kwargs with key lut_function. The function must have input params (nbits, wp) where nbits is the number of quantization bits and wp is the list of weights for a given layer. The function should return two parameters (lut, qw) where lut is an array of length (2^n bits)containing LUT values and qw is the list of quantized weight parameters. See _get_linear_lookup_table_and_weight for a sample implementation.
Linear quantization with scale and bias assuming the range of weight values is [-A, A], where A = max(abs(weight)).
sample_data: str | [dict]
Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
kwargs: keyword arguments
lut_function : (callable function)
A callable function provided when quantization mode is set to _QUANTIZATION_MODE_CUSTOM_LOOKUP_TABLE. See quantization_mode for more details.
selector: QuantizedLayerSelector
A QuanatizedLayerSelector object that can be derived to provide custom quantization selection.
model: MLModel

The quantized MLModel instance if running on macOS 10.14 or later, otherwise the quantized model specification is returned


>>> import coremltools
>>> from coremltools.models.neural_network import quantization_utils
>>> model = coremltools.models.MLModel('my_model.mlmodel')
>>> quantized_model = quantization_utils.quantize_weights(model, 8, "linear")