palettize_weights function discretizes the values of all weights in the ML program and constructs the LUT according to the algorithm you specify as
mode in the
OpPalettizerConfig. The float values are then converted to nbit values, and the LUT is saved along side each weight. The
const ops that were storing weight values are replaced by
The following example shows how to palettize the weights of a Core ML model:
from coremltools.optimize.coreml import (
op_config = OpPalettizerConfig(mode="kmeans", nbits=6, weight_threshold=512)
config = OptimizationConfig(global_config=op_config)
compressed_6_bit_model = palettize_weights(model, config=config)
Specify how the LUT is constructed by choosing one of the following as the
"kmeans"(default) : The LUT is generated by k-means clustering, with number of clusters set to
nbitscan be one of
1, 2, 4, 6, 8.
"uniform": The LUT is generated by computing uniformly spaced intervals between the minimum and maximum values in the weight tensor.
"unique": In this mode,
np.uniqueis applied to the weight values, and if 256 or less number of unique values are found, they are converted into lookup table form. Nothing is done if there are more than 256 uniques values.
weight_threshold parameter specifies the minimum number of elements that the weight tensor must have for palettization to take place. In the previous code sample, since
weight_threshold=512 was specified, all the weight tensors that have less than
512 elements will be left untouched, while the tensors of size greater than
512 will be palettized.
For options on how to set different pruning configs for different weights in the same network, see Customizing Ops to Compress.
For more details on the parameters available in the config, see the following in the API Reference:
Post-Training Palettization Works Well for nbits = 6, 8
Results are model and task dependent, but in most cases, palettizing with
optimize.coreml.palettize_weights preserves the accuracy to a good degree for 6-bit or 8-bit settings. With lower settings, you will likely see a sharp drop in accuracy, in which case consider using Training-Time Palettization with
nbits = 2, 4.