w4_per_blockΒΆ
- w4_per_block(*, block_size=32, axis=None, execution_mode=<ExecutionMode.GRAPH: 'graph'>)
int4 weight-only quantization, per-block symmetric, block_size defaults to 32.
- Parameters:
block_size (int) β Block size along the input channel dimension (default 32).
axis (int | None) β Axis to apply blocks along. When
None(default), the axis is auto-resolved based on the module type during quantization.execution_mode (ExecutionMode) β Quantization execution mode. Defaults to
ExecutionMode.GRAPH.
- Returns:
int4 per-block weight-only configuration.
- Return type: