w4_per_block

w4_per_block(*, block_size=32, axis=None)

int4 weight-only quantization, per-block symmetric, block_size defaults to 32.

Parameters:
  • block_size (int) – Block size along the input channel dimension (default 32).

  • axis (int | None) – Axis to apply blocks along. When None (default), the axis is auto-resolved based on the module type during quantization.

Returns:

int4 per-block weight-only module configuration.

Return type:

ModuleQuantizerConfig