coreai_opt.coreai_utils.palettize_weights¶
- coreai_opt.coreai_utils.palettize_weights(coreai_program, lut_dtype, n_bits=4, granularity=CompressionGranularity.PER_TENSOR, group_size=32, cluster_dim=1, enable_per_channel_scale=False, weight_num_threshold=1024, num_kmeans_workers=4, enable_fast_kmeans_mode=True, rounding_precision=4, in_place=False)[source]¶
Palettize weights in a Core AI AIProgram (MLIR<CoreAI> IR) by using Core AI ops.
Walks through the IR and palettizes each coreai.constant op that needs to be compressed. Only constants consumed by ops in
_OPS_WEIGHT_NEED_COMPRESSIONare candidates; ops that fail to be palettized are skipped with a warning.- Parameters:
coreai_program (AIProgram) – The model to be palettized.
lut_dtype (DType | None) – The datatype for values in the look-up table. Can be
None(no LUT quantization),DType.INT8,DType.UINT8,DType.FP8_E4M3FN, orDType.FP8_E5M2. Symmetric quantization is used by default. Defaults toNone.n_bits (int) – Number of bits for palettizing the weights. Defaults to
4. A LUT will have2**n_bitsentries; n_bits must be in{1, 2, 3, 4, 6, 8}.granularity (CompressionGranularity) – Quantization granularity. Supports
CompressionGranularity.PER_TENSOR,CompressionGranularity.PER_CHANNEL, andCompressionGranularity.PER_GROUPED_CHANNEL. Defaults toCompressionGranularity.PER_TENSOR.group_size (int) – Number of channels in a group. Only effective when granularity is
CompressionGranularity.PER_GROUPED_CHANNEL.cluster_dim (int) – Dimension of centroids for each lookup table. When
cluster_dim > 1, it indicates 2-D clustering. Defaults to 1 (scalar palettization).enable_per_channel_scale (bool) – When
True, weights are normalized along output channels using per-channel scales before palettization. Not supported withcluster_dim > 1.weight_num_threshold (int) – Threshold of weight element count to determine whether to compress a weight.
num_kmeans_workers (int) – Number of worker processes for k-means. Defaults to 4.
enable_fast_kmeans_mode (bool) – Whether to use weight rounding to speed up k-means. Weight rounding reduces precision but speeds up k-means clustering. Defaults to
True.rounding_precision (int) – Number of decimal places to round weights to during fast K-means clustering. Only effective when
enable_fast_kmeans_modeisTrue. Defaults to4.in_place (bool) – Whether to palettize the model in-place. Defaults to
False.
- Returns:
A palettized Core AI program.
- Return type:
AIProgram