What’s New#
Software Availability of Optimizations#
OS version |
Compression modes or optimizations added |
---|---|
|
* Palettization: |
|
* Quantization: |
|
* Palettization: |
Optimizations for iOS15 / macOS12 and lower
Compression optimizations can only be applied to
the neuralnetwork
model type.
This can be done via the ct.models.neural_networks.quantization_utils.*
APIs.
For later OS versions, all optimizations are applicable to the mlprogram
model
type only and can be accessed via the APIs available
under the coremltools.optimize.*
subspace.
Core ML Tools Optimization APIs#
The following sections contain a list of APIs available in coremltools to transform models using
different compression modes (mentioned in the table above) and workflows.
Note that coremltools.optimize
is denoted as cto
below.
Core ML Tools 8#
All previous (coremltools 7) APIs have been updated to support
new compression modes available in iOS18
/ macOS15
(e.g. grouped channel palettization).
The following APIs have also been added, available from coremltools==8.0b1
:
Compression Type |
Input Model format |
API (method or class) |
Optimization workflow |
---|---|---|---|
Palettization |
PyTorch model |
|
palettize in a data-free manner |
Palettization |
PyTorch model |
|
palettize with calibration dataset using sensitive k-means algorithm |
Quantization |
PyTorch model |
|
quantize with calibration dataset using the GPTQ algorithm |
Quantization |
PyTorch model |
|
quantize weights in a data-free manner |
Pruning |
PyTorch model |
|
prune with calibration dataset using the SparseGPT algorithm |
Another method, cto.coreml.experimental.linear_quantize_activations
,
takes an mlpackage
and calibration data
and produces a model with activations quantized
to 8 bits. This can then be passed to the cto.coreml.linear_quantize_weights
method
to get a W8A8 model. The API and its implementations may undergo some changes as it is moved out of the experimental namespace
in future non-beta releases of Core ML Tools.
Core ML Tools 7#
Compression Type |
Input Model format |
API (method or class) |
Optimization workflow |
---|---|---|---|
Palettization |
Core ML (mlpackage) |
|
palettize in a data-free manner |
Palettization |
PyTorch model |
|
palettize via fine-tuning using differentiable k-means |
Quantization |
Core ML (mlpackage) |
|
quantize weights to 8 bits in a data-free manner |
Quantization |
PyTorch model |
|
quantize weights and/or activations either with fine-tuning or with a calibration dataset |
Pruning |
Core ML (mlpackage) |
|
transform a dense model to one with sparse weights |
Pruning |
PyTorch model |
|
sparsify via fine-tuning using magnitude-based pruning algorithm |