coremltools.models.neural_network.quantization_utils¶
Utilities to compress Neural Network Models. Only available in coremltools 2.0b1 and onwards
Functions
compare_models (full_precision_model, …) |
Utility function to compare the performance of a full precision vs quantized model |
quantize_spec_weights (\*args, \*\*kwargs) |
|
quantize_weights (full_precision_model, nbits) |
Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16). |
unpack_to_bytes (byte_arr, num_weights, nbits) |
Classes
AdvancedQuantizedLayerSelector ([…]) |
Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers. |
ModelMetrics (spec) |
A utility class to hold evaluation metrics |
NoiseMetrics () |
|
OutputMetric (name, type) |
Utility class to calculate and hold metrics between two model outputs |
QuantizedLayerSelector () |
This is the base class to implement custom selectors to skip certain layers during quantization. |
TopKMetrics (topk) |
-
class
coremltools.models.neural_network.quantization_utils.
AdvancedQuantizedLayerSelector
(skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)¶ Quantized layer selector allowing the user to specify some types of layers to skip during quantization process and the minimum size parameters in quantized convolution layers.
Examples
from coremltools.models.neural_network.quantization_utils import AdvancedQuantizedLayerSelector selector = AdvancedQuantizedLayerSelector( skip_layer_types=['batchnorm', 'bias', 'depthwiseConv'], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096) quantized_model = quantize_weights(model, 8, selector=selector)
-
__init__
(self, skip_layer_types=[], minimum_conv_kernel_channels=4, minimum_conv_weight_count=4096)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
do_quantize
(self, layer, weight_param=None)¶ weight_param - should be name of the WeightParam field
-
-
class
coremltools.models.neural_network.quantization_utils.
ModelMetrics
(spec)¶ A utility class to hold evaluation metrics
-
__init__
(self, spec)¶
-
-
class
coremltools.models.neural_network.quantization_utils.
OutputMetric
(name, type)¶ Utility class to calculate and hold metrics between two model outputs
-
__init__
(self, name, type)¶
-
-
class
coremltools.models.neural_network.quantization_utils.
QuantizedLayerSelector
¶ This is the base class to implement custom selectors to skip certain layers during quantization. To implement a custom selector, create a class that inherits this class and override do_quantize() method.
Examples
class MyLayerSelector(QuantizedLayerSelector): def __init__(self): super(MyLayerSelector, self).__init__() def do_quantize(self, layer, **kwargs): ret = super(MyLayerSelector, self).do_quantize(layer) if not ret or layer.name == 'dense_2': return False return True selector = MyLayerSelector() quantized_model = quantize_weights(mlmodel, 8, quantization_mode='linear', selector=selector)
-
__init__
(self)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
-
coremltools.models.neural_network.quantization_utils.
compare_models
(full_precision_model, quantized_model, sample_data)¶ Utility function to compare the performance of a full precision vs quantized model
- full_precision_model: MLModel
- The full precision model with float32 weights
- quantized_model: MLModel
- Quantized version of the model with quantized weights
- sample_data: str | [dict]
- Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
Returns: None. Performance metrics are printed out
-
coremltools.models.neural_network.quantization_utils.
quantize_weights
(full_precision_model, nbits, quantization_mode='linear', sample_data=None, **kwargs)¶ Utility function to convert a full precision (float) MLModel to a nbit quantized MLModel (float16).
- full_precision_model: MLModel
- Model which will be converted to half precision. Currently conversion for only neural network models is supported. If a pipeline model is passed in then all embedded neural network models embedded within will be converted.
- nbits: int
- Number of bits per quantized weight. Only 16-bit float point and
- 1-8 bit is supported
- quantization_mode: str
One of the following:
- “linear”:
- Linear quantization with scale and bias assuming the range of weight values is [A, B], where A = min(weight), B = max(weight)
- “linear_lut”:
- Simple linear quantization represented as a lookup table
- “kmeans_lut”:
- LUT based quantization, where LUT is generated by K-Means clustering
- “custom_lut”:
- LUT quantization where LUT and quantized weight params are
calculated using a custom function. If this mode is selected then
a custom function must be passed in kwargs with key lut_function.
The function must have input params (nbits, wp) where nbits is the
number of quantization bits and wp is the list of weights for a
given layer. The function should return two parameters (lut, qw)
where lut is an array of length (2^n bits)containing LUT values and
qw is the list of quantized weight parameters. See
_get_linear_lookup_table_and_weight
for a sample implementation. - “linear_symmetric”:
- Linear quantization with scale and bias assuming the range of weight values is [-A, A], where A = max(abs(weight)).
- sample_data: str | [dict]
- Data used to characterize performance of the quantized model in comparison to the full precision model. Either a list of sample input dictionaries or an absolute path to a directory containing images. Path to a directory containing images is only valid for models with one image input. For all other models a list of sample inputs must be provided.
- kwargs: keyword arguments
- lut_function : (
callable function
) - A callable function provided when quantization mode is set to
_QUANTIZATION_MODE_CUSTOM_LOOKUP_TABLE
. Seequantization_mode
for more details. - selector: QuantizedLayerSelector
- A QuanatizedLayerSelector object that can be derived to provide custom quantization selection.
- lut_function : (
Returns: - model: MLModel
The quantized MLModel instance if running on macOS 10.14 or later, otherwise the quantized model specification is returned
Examples
>>> import coremltools >>> from coremltools.models.neural_network import quantization_utils >>> model = coremltools.models.MLModel('my_model.mlmodel') >>> quantized_model = quantization_utils.quantize_weights(model, 8, "linear")