coreai_opt.coreai_utils.sparsify_weights¶
- coreai_opt.coreai_utils.sparsify_weights(coreai_program, target_sparsity=0.5, block_size=None, n_m_ratio=None, quantize_dtype=None, palettize_nbits=None, weight_num_threshold=1024, in_place=False)[source]¶
Sparsify weights in a Core AI AIProgram (MLIR<CoreAI> IR) by using Core AI ops.
Walks through the IR and sparsifies each coreai.constant op that needs to be compressed. Only constants consumed by ops in
_OPS_WEIGHT_NEED_COMPRESSIONare candidates; ops that fail to be sparsified are skipped with a warning.- Parameters:
coreai_program (AIProgram) – The model to be sparsified.
target_sparsity (float | None) – Percentage of sparsity in
[0, 1].nlowest absolute weight values are set to zero, wheren = floor(size * target_sparsity). Mutually exclusive withn_m_ratio. Defaults to0.5.block_size (int | None) – Block size for block sparsity along the output channel dimension. Only applied to
linearandconvlayers. If set, must be greater than1. Defaults toNone.n_m_ratio (tuple[int, int] | None) –
(n, m)ratio for n:m structured pruning along the input channel axis. Out of everymelements, thenwith lowest magnitude are set to zero. Only applied tolinearandconvlayers. Mutually exclusive withtarget_sparsity. Defaults toNone.quantize_dtype (DType | None) – Data type for storing non-zero values (joint compression). Must be
None,DType.INT8,DType.UINT8,DType.FP8_E4M3FN, orDType.FP8_E5M2. When set, non-zero values are quantized and acoreai.blockwise_shift_scaleop dequantizes them back. Cannot be used withpalettize_nbits. Defaults toNone.palettize_nbits (int | None) – Number of bits for palettizing non-zero values. When set, non-zero values are palettized using k-means with
2**palettize_nbitsclusters. Valid values:{1, 2, 3, 4, 6, 8}. Cannot be used withquantize_dtype. Defaults toNone.weight_num_threshold (int) – Minimum weight element count required to compress a weight. Defaults to
1024.in_place (bool) – Whether to sparsify the model in-place. Defaults to
False.
- Returns:
A sparsified Core AI program.
- Return type:
AIProgram
- Raises:
ValueError – If both
target_sparsityandn_m_ratioare set.ValueError – If neither
target_sparsitynorn_m_ratiois set.ValueError – If both
quantize_dtypeandpalettize_nbitsare set.ValueError – If
quantize_dtypeis notNone,DType.INT8,DType.UINT8,DType.FP8_E4M3FN, orDType.FP8_E5M2.ValueError – If
palettize_nbitsis not in{1, 2, 3, 4, 6, 8}.ValueError – If
block_sizeis set and not greater than1.ValueError – If
n_m_ratiodoes not have length 2, contains non-integers, hasm <= 0, or hasnoutside[0, m].