coremltools.models.neural_network.quantization_utils¶

Utilities to compress Neural Network Models. Only available in coremltools 2.0b1 and onwards

Functions

`compare_models`(full_precision_model, …)	Utility function to compare the performance of a full precision vs quantized model
`quantize_spec_weights`(\args, \\*kwargs)
`quantize_weights`(full_precision_model, nbits)	Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).
`unpack_to_bytes`(byte_arr, num_weights, nbits)

Classes

`AdvancedQuantizedLayerSelector`([…])	Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.
`ModelMetrics`(spec)	A utility class to hold evaluation metrics
`NoiseMetrics`()
`OutputMetric`(name, type)	Utility class to calculate and hold metrics between two model outputs
`QuantizedLayerSelector`()	This is the base class to implement custom selectors to skip certain layers during quantization.
`TopKMetrics`(topk)

class coremltools.models.neural_network.quantization_utils.AdvancedQuantizedLayerSelector(skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)¶

Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.

Examples

from coremltools.models.neural_network.quantization_utils import AdvancedQuantizedLayerSelector
selector = AdvancedQuantizedLayerSelector(
        skip_layer_types=['batchnorm', 'bias', 'depthwiseConv'],
        minimum_conv_kernel_channels=4,
        minimum_conv_weight_count=4096)
quantized_model = quantize_weights(model, 8, selector=selector)

__init__(self, skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)¶: x.__init__(…) initializes x; see help(type(x)) for signature

do_quantize(self, layer, weight_param=None)¶: weight_param - should be name of the WeightParam field

class coremltools.models.neural_network.quantization_utils.ModelMetrics(spec)¶

A utility class to hold evaluation metrics

__init__(self, spec)¶

class coremltools.models.neural_network.quantization_utils.OutputMetric(name, type)¶

Utility class to calculate and hold metrics between two model outputs

__init__(self, name, type)¶

class coremltools.models.neural_network.quantization_utils.QuantizedLayerSelector¶

This is the base class to implement custom selectors to skip certain layers during quantization. To implement a custom selector, create a class that inherits this class and override do_quantize() method.

Examples

class MyLayerSelector(QuantizedLayerSelector):
    def __init__(self):
        super(MyLayerSelector, self).__init__()

    def do_quantize(self, layer, **kwargs):
        ret = super(MyLayerSelector, self).do_quantize(layer)
        if not ret or layer.name == 'dense_2':
            return False
        return True

selector = MyLayerSelector()
quantized_model = quantize_weights(mlmodel, 8, quantization_mode='linear', selector=selector)

__init__(self)¶: x.__init__(…) initializes x; see help(type(x)) for signature

coremltools.models.neural_network.quantization_utils.compare_models(full_precision_model, quantized_model, sample_data)¶

Utility function to compare the performance of a full precision vs quantized model

full_precision_model: MLModel: The full precision model with float32 weights
quantized_model: MLModel: Quantized version of the model with quantized weights
sample_data: str | [dict]: Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.

Returns:	None. Performance metrics are printed out

coremltools.models.neural_network.quantization_utils.quantize_weights(full_precision_model, nbits, quantization_mode='linear', sample_data=None, **kwargs)¶

Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).

full_precision_model: MLModel

Model which will be converted to half precision. Currently conversion for only neural network models is supported. If a pipeline model is passed in then all embedded neural network models embedded within will be converted.

nbits: int

Number of bits per quantized weight. Only 16-bit float point and: 1-8 bit is supported

quantization_mode: str

One of the following:

“linear”:: Linear quantization with scale and bias assuming the range of weight values is [A, B], where A = min(weight), B = max(weight)
“linear_lut”:: Simple linear quantization represented as a lookup table
“kmeans_lut”:: LUT based quantization, where LUT is generated by K-Means clustering
“custom_lut”:: LUT quantization where LUT and quantized weight params are calculated using a custom function. If this mode is selected then a custom function must be passed in kwargs with key lut_function. The function must have input params (nbits, wp) where nbits is the number of quantization bits and wp is the list of weights for a given layer. The function should return two parameters (lut, qw) where lut is an array of length (2^n bits)containing LUT values and qw is the list of quantized weight parameters. See _get_linear_lookup_table_and_weight for a sample implementation.
“linear_symmetric”:: Linear quantization with scale and bias assuming the range of weight values is [-A, A], where A = max(abs(weight)).

sample_data: str | [dict]

Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.

kwargs: keyword arguments

lut_function : (callable function): A callable function provided when quantization mode is set to _QUANTIZATION_MODE_CUSTOM_LOOKUP_TABLE. See quantization_mode for more details.
selector: QuantizedLayerSelector: A QuanatizedLayerSelector object that can be derived to provide custom quantization selection.

Returns:	model: MLModel The quantized MLModel instance if running on macOS 10.14 or later, otherwise the quantized model specification is returned

Examples

>>> import coremltools
>>> from coremltools.models.neural_network import quantization_utils
>>> model = coremltools.models.MLModel('my_model.mlmodel')
>>> quantized_model = quantization_utils.quantize_weights(model, 8, "linear")