Post-Training Palettization#

The palettize_weights function discretizes the values of all weights in the ML program and constructs the LUT according to the algorithm you specify as mode in the OpPalettizerConfig. The float values are then converted to nbit values, and the LUT is saved along side each weight. The const ops that were storing weight values are replaced by constexpr_lut_to_dense ops.

The following example shows how to palettize the weights of a Core ML model:

from coremltools.optimize.coreml import (
    OpPalettizerConfig,
    OptimizationConfig,
    palettize_weights,
)

op_config = OpPalettizerConfig(mode="kmeans", nbits=6, weight_threshold=512)
config = OptimizationConfig(global_config=op_config)
compressed_6_bit_model = palettize_weights(model, config=config)

Specify how the LUT is constructed by choosing one of the following as the mode:

  • "kmeans" (default) : The LUT is generated by k-means clustering, with number of clusters set to 2^nbits. nbits can be one of 1, 2, 4, 6, 8 .

  • "uniform": The LUT is generated by computing uniformly spaced intervals between the minimum and maximum values in the weight tensor.

  • "unique": In this mode, np.unique is applied to the weight values, and if 256 or less number of unique values are found, they are converted into lookup table form. Nothing is done if there are more than 256 uniques values.

The weight_threshold parameter specifies the minimum number of elements that the weight tensor must have for palettization to take place. In the previous code sample, since weight_threshold=512 was specified, all the weight tensors that have less than 512 elements will be left untouched, while the tensors of size greater than 512 will be palettized.

For options on how to set different pruning configs for different weights in the same network, see Customizing Ops to Compress.

For more details on the parameters available in the config, see the following in the API Reference:

Post-Training Palettization Works Well for nbits = 6, 8

Results are model and task dependent, but in most cases, palettizing with optimize.coreml.palettize_weights preserves the accuracy to a good degree for 6-bit or 8-bit settings. With lower settings, you will likely see a sharp drop in accuracy, in which case consider using Training-Time Palettization with nbits = 2, 4.