coremltools.models.neural_network.quantization_utils

Utilities to compress Neural Network Models. Only available in coremltools 2.0b1 and onwards

Functions

compare_models(full_precision_model, …) Utility function to compare the performance of a full precision vs quantized model
quantize_spec_weights(spec, nbits, …)
quantize_weights(full_precision_model, nbits) Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).
unpack_to_bytes(byte_arr, num_weights, nbits)

Classes

AdvancedQuantizedLayerSelector([…]) Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.
ModelMetrics(spec) A utility class to hold evaluation metrics
NoiseMetrics()
OutputMetric(name, type) Utility class to calculate and hold metrics between two model outputs
QuantizedLayerSelector() This is the base class to implement custom selectors to skip certain layers during quantization.
TopKMetrics(topk)
class coremltools.models.neural_network.quantization_utils.AdvancedQuantizedLayerSelector(skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)

Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.

Examples

from coremltools.models.neural_network.quantization_utils import AdvancedQuantizedLayerSelector
selector = AdvancedQuantizedLayerSelector(
        skip_layer_types=['batchnorm', 'bias', 'depthwiseConv'],
        minimum_conv_kernel_channels=4,
        minimum_conv_weight_count=4096)
quantized_model = quantize_weights(model, 8, selector=selector)
__init__(self, skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)

x.__init__(…) initializes x; see help(type(x)) for signature

do_quantize(self, layer, weight_param=None)

weight_param - should be name of the WeightParam field

class coremltools.models.neural_network.quantization_utils.ModelMetrics(spec)

A utility class to hold evaluation metrics

__init__(self, spec)
class coremltools.models.neural_network.quantization_utils.OutputMetric(name, type)

Utility class to calculate and hold metrics between two model outputs

__init__(self, name, type)
class coremltools.models.neural_network.quantization_utils.QuantizedLayerSelector

This is the base class to implement custom selectors to skip certain layers during quantization. To implement a custom selector, create a class that inherits this class and override do_quantize() method.

Examples

class MyLayerSelector(QuantizedLayerSelector):
    def __init__(self):
        super(MyLayerSelector, self).__init__()

    def do_quantize(self, layer, **kwargs):
        ret = super(MyLayerSelector, self).do_quantize(layer)
        if not ret or layer.name == 'dense_2':
            return False
        return True

selector = MyLayerSelector()
quantized_model = quantize_weights(mlmodel, 8, quantization_mode='linear', selector=selector)
__init__(self)

x.__init__(…) initializes x; see help(type(x)) for signature

coremltools.models.neural_network.quantization_utils.compare_models(full_precision_model, quantized_model, sample_data)

Utility function to compare the performance of a full precision vs quantized model

full_precision_model: MLModel
The full precision model with float32 weights
quantized_model: MLModel
Quantized version of the model with quantized weights
sample_data: str | [dict]
Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
Returns:None. Performance metrics are printed out
coremltools.models.neural_network.quantization_utils.quantize_weights(full_precision_model, nbits, quantization_mode='linear', sample_data=None, **kwargs)

Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).

full_precision_model: MLModel
Model which will be converted to half precision. Currently conversion for only neural network models is supported. If a pipeline model is passed in then all embedded neural network models embedded within will be converted.
nbits: int
Number of bits per quantized weight. Only 16-bit float point and
1-8 bit is supported
quantization_mode: str

One of the following:

“linear”:
Linear quantization with scale and bias assuming the range of weight values is [A, B], where A = min(weight), B = max(weight)
“linear_lut”:
Simple linear quantization represented as a lookup table
“kmeans_lut”:
LUT based quantization, where LUT is generated by K-Means clustering
“custom_lut”:
LUT quantization where LUT and quantized weight params are calculated using a custom function. If this mode is selected then a custom function must be passed in kwargs with key lut_function. The function must have input params (nbits, wp) where nbits is the number of quantization bits and wp is the list of weights for a given layer. The function should return two parameters (lut, qw) where lut is an array of length (2^n bits)containing LUT values and qw is the list of quantized weight parameters. See _get_linear_lookup_table_and_weight for a sample implementation.
“linear_symmetric”:
Linear quantization with scale and bias assuming the range of weight values is [-A, A], where A = max(abs(weight)).
sample_data: str | [dict]
Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
kwargs: keyword arguments
lut_function : (callable function)
A callable function provided when quantization mode is set to _QUANTIZATION_MODE_CUSTOM_LOOKUP_TABLE. See quantization_mode for more details.
selector: QuantizedLayerSelector
A QuanatizedLayerSelector object that can be derived to provide custom quantization selection.
Returns:
model: MLModel

The quantized MLModel instance if running on macOS 10.14 or later, otherwise the quantized model specification is returned

Examples

>>> import coremltools
>>> from coremltools.models.neural_network import quantization_utils
>>> model = coremltools.models.MLModel('my_model.mlmodel')
>>> quantized_model = quantization_utils.quantize_weights(model, 8, "linear")