coreai_opt.quantization.spec.fake_quantize.FakeQuantizeImplBase¶

class coreai_opt.quantization.spec.fake_quantize.FakeQuantizeImplBase(dtype, qscheme, qformulation, granularity, target_dtype, quant_min, quant_max, qparams_calculator, quantization_target, n_bits=None, **kwargs)[source]¶

Bases: CompressionSimulatorBase, FakeQuantizeBase

Base class for implementing fake quantization

Parameters:

dtype (torch.dtype)
qscheme (QuantizationScheme)
qformulation (QuantizationFormulation)
granularity (QuantizationGranularity)
target_dtype (torch.dtype)
quant_min (int | float)
quant_max (int | float)
qparams_calculator (QParamsCalculatorBase)
quantization_target (CompressionTargetTensor)
n_bits (int | None)

__init__(dtype, qscheme, qformulation, granularity, target_dtype, quant_min, quant_max, qparams_calculator, quantization_target, n_bits=None, **kwargs)[source]¶

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

dtype (dtype)
qscheme (QuantizationScheme)
qformulation (QuantizationFormulation)
granularity (QuantizationGranularity)
target_dtype (dtype)
quant_min (int | float)
quant_max (int | float)
qparams_calculator (QParamsCalculatorBase)
quantization_target (CompressionTargetTensor)
n_bits (int | None)

Methods

`calculate_qparams`()	Returns the computed (scale, zero_point, minval).
`convert`(model, observer_node)	No-op: keep fake quant nodes intact during convert_pt2e.
`dequantize`(tensor, scale, zero_point, minval)	Given a quantized tensor, the scale and zero point used to perform quantization, perform de-quantization of the tensor based on the configuration in the `QuantizationSpec` and return it as a tensor with dtype as `output_dtype`.
`disable_observer`()	Disable the observer, unless the qparams calculator is stateless.
`enable_observer`([enabled])	Inverse of `disable_observer`: ignore `enabled=False` when the qparams calculator is stateless.
`extra_repr`()	Return the extra representation of the module.
`forward`(tensor)	Performs fake quantization of the given tensor using the qparams (scale, zero point, minval) computed by the QParamsCalculator.
`get_class`(key)
`is_disabled`()	Return True if fake quantization has been disabled.
`list_registry_keys`()
`list_registry_values`()
`quantize`(tensor, scale, zero_point, minval)	Given a tensor, scale and zero point, perform quantization of the tensor based on the configuration in the `QuantizationSpec`.
`register`(key)	Register a virtual subclass of an ABC.
`resolve`(data)	Resolve a string key or class type against this registry.
`set_export_mode`([enabled])	Set or unset export mode.
`with_args`(**kwargs)

calculate_qparams()[source]¶

Returns the computed (scale, zero_point, minval). zero_point and minval are None for floating-point dtypes.

Return type:: tuple[Tensor, Tensor | None, Tensor | None]

convert(model, observer_node)[source]¶

No-op: keep fake quant nodes intact during convert_pt2e.

If this method is not present, torchao’s convert method will try to replace fake quant nodes with its standard quantize/dequantize ops and fails in the process

Parameters:

model (GraphModule)
observer_node (Node)

Return type:

None

abstract dequantize(tensor, scale, zero_point, minval, output_dtype=torch.float32)[source]¶

Given a quantized tensor, the scale and zero point used to perform quantization, perform de-quantization of the tensor based on the configuration in the QuantizationSpec and return it as a tensor with dtype as output_dtype.

Parameters:

tensor (Tensor) – The tensor to dequantize
scale (Tensor) – The scale to use for dequantization
zero_point (Tensor | None) – The zero point computed by the qparams calculator (None for floating-point dtypes).
minval (Tensor | None) – The minimum representable float value of the observed range, computed by the qparams calculator (None for floating-point dtypes).
output_dtype (dtype) – The dtype to use for the dequantized tensor

Return type:

Tensor

disable_observer()[source]¶

Disable the observer, unless the qparams calculator is stateless.

Applies to any caller (direct, apply(disable_observer), convert_pt2e, QAT scheduling). Stateless calculators recompute per forward and need observer_enabled=1 permanently — forward uses that flag to route between live recompute and the stateful get_qparams() cache (which stateless doesn’t have).

Return type:: None

enable_observer(enabled=True)[source]¶

Inverse of disable_observer: ignore enabled=False when the qparams calculator is stateless. Covers callers that invoke enable_observer(False) directly (e.g. the QAT scheduler at quantizer.py:_maybe_apply_qat_schedule); disable_observer() itself routes through the override above.

Parameters:: enabled (bool)
Return type:: None

extra_repr()[source]¶

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:: str

forward(tensor)[source]¶

Performs fake quantization of the given tensor using the qparams (scale, zero point, minval) computed by the QParamsCalculator.

Parameters:: tensor (Tensor)
Return type:: Tensor

is_disabled()[source]¶

Return True if fake quantization has been disabled.

Return type:: bool

abstract quantize(tensor, scale, zero_point, minval, cast_to_target_dtype=True)[source]¶

Given a tensor, scale and zero point, perform quantization of the tensor based on the configuration in the QuantizationSpec.

Parameters:

tensor (Tensor) – The tensor to quantize
scale (Tensor) – The scale to use for quantization
zero_point (Tensor | None) – The zero point computed by the qparams calculator (None for floating-point dtypes).
minval (Tensor | None) – The minimum representable float value of the observed range, computed by the qparams calculator (None for floating-point dtypes).
cast_to_target_dtype (bool) – If True, the quantized tensor is cast to the target_dtype. Otherwise, the values of the tensor are quantized to appropriate bins but the dtype used to represent the quantized tensor remains the same as the original tensor. This allows fake quantization to capture the quantization error while allowing gradients to backpropagate.

Return type:

Tensor

set_export_mode(enabled=True)[source]¶

Set or unset export mode.

Parameters:: enabled (bool)
Return type:: None

classmethod with_args(**kwargs)[source]¶

Parameters:: kwargs (dict)
Return type:: PartialConstructor[FakeQuantizeImplBase]

property granularity: QuantizationGranularity¶: Getter for granularity.