w4_per_blockΒΆ

w4_per_block(*, block_size=32, axis=None, execution_mode=<ExecutionMode.GRAPH: 'graph'>)

int4 weight-only quantization, per-block symmetric, block_size defaults to 32.

Parameters:
  • block_size (int) – Block size along the input channel dimension (default 32).

  • axis (int | None) – Axis to apply blocks along. When None (default), the axis is auto-resolved based on the module type during quantization.

  • execution_mode (ExecutionMode) – Quantization execution mode. Defaults to ExecutionMode.GRAPH.

Returns:

int4 per-block weight-only configuration.

Return type:

QuantizerConfig