coreai_opt.quantization.spec.fake_quantize.FakeQuantizeImplBase¶
- class coreai_opt.quantization.spec.fake_quantize.FakeQuantizeImplBase(dtype, qscheme, qformulation, granularity, target_dtype, quant_min, quant_max, qparams_calculator, quantization_target, n_bits=None, **kwargs)[source]¶
Bases:
CompressionSimulatorBase,FakeQuantizeBaseBase class for implementing fake quantization
- Parameters:
dtype (torch.dtype)
qscheme (QuantizationScheme)
qformulation (QuantizationFormulation)
granularity (QuantizationGranularity)
target_dtype (torch.dtype)
quant_min (int | float)
quant_max (int | float)
qparams_calculator (QParamsCalculatorBase)
quantization_target (CompressionTargetTensor)
n_bits (int | None)
- __init__(dtype, qscheme, qformulation, granularity, target_dtype, quant_min, quant_max, qparams_calculator, quantization_target, n_bits=None, **kwargs)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
dtype (dtype)
qscheme (QuantizationScheme)
qformulation (QuantizationFormulation)
granularity (QuantizationGranularity)
target_dtype (dtype)
quant_min (int | float)
quant_max (int | float)
qparams_calculator (QParamsCalculatorBase)
quantization_target (CompressionTargetTensor)
n_bits (int | None)
Methods
Returns the computed (scale, zero_point, minval).
convert(model, observer_node)No-op: keep fake quant nodes intact during convert_pt2e.
dequantize(tensor, scale, zero_point, minval)Given a quantized tensor, the scale and zero point used to perform quantization, perform de-quantization of the tensor based on the configuration in the
QuantizationSpecand return it as a tensor with dtype asoutput_dtype.Return the extra representation of the module.
forward(tensor)Performs fake quantization of the given tensor using the qparams (scale, zero point, minval) computed by the QParamsCalculator.
get_class(key)Return True if fake quantization has been disabled.
list_registry_keys()list_registry_values()quantize(tensor, scale, zero_point, minval)Given a tensor, scale and zero point, perform quantization of the tensor based on the configuration in the
QuantizationSpec.register(key)Register a virtual subclass of an ABC.
set_export_mode([enabled])Set or unset export mode.
with_args(**kwargs)- calculate_qparams()[source]¶
Returns the computed (scale, zero_point, minval).
zero_pointandminvalare None for floating-point dtypes.- Return type:
tuple[Tensor, Tensor | None, Tensor | None]
- convert(model, observer_node)[source]¶
No-op: keep fake quant nodes intact during convert_pt2e.
If this method is not present, torchao’s convert method will try to replace fake quant nodes with its standard quantize/dequantize ops and fails in the process
- Parameters:
model (GraphModule)
observer_node (Node)
- Return type:
None
- abstract dequantize(tensor, scale, zero_point, minval, output_dtype=torch.float32)[source]¶
Given a quantized tensor, the scale and zero point used to perform quantization, perform de-quantization of the tensor based on the configuration in the
QuantizationSpecand return it as a tensor with dtype asoutput_dtype.- Parameters:
tensor (Tensor) – The tensor to dequantize
scale (Tensor) – The scale to use for dequantization
zero_point (Tensor | None) – The zero point computed by the qparams calculator (None for floating-point dtypes).
minval (Tensor | None) – The minimum representable float value of the observed range, computed by the qparams calculator (None for floating-point dtypes).
output_dtype (dtype) – The dtype to use for the dequantized tensor
- Return type:
Tensor
- extra_repr()[source]¶
Return the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type:
str
- forward(tensor)[source]¶
Performs fake quantization of the given tensor using the qparams (scale, zero point, minval) computed by the QParamsCalculator.
- Parameters:
tensor (Tensor)
- Return type:
Tensor
- abstract quantize(tensor, scale, zero_point, minval, cast_to_target_dtype=True)[source]¶
Given a tensor, scale and zero point, perform quantization of the tensor based on the configuration in the
QuantizationSpec.- Parameters:
tensor (Tensor) – The tensor to quantize
scale (Tensor) – The scale to use for quantization
zero_point (Tensor | None) – The zero point computed by the qparams calculator (None for floating-point dtypes).
minval (Tensor | None) – The minimum representable float value of the observed range, computed by the qparams calculator (None for floating-point dtypes).
cast_to_target_dtype (bool) – If True, the quantized tensor is cast to the target_dtype. Otherwise, the values of the tensor are quantized to appropriate bins but the dtype used to represent the quantized tensor remains the same as the original tensor. This allows fake quantization to capture the quantization error while allowing gradients to backpropagate.
- Return type:
Tensor
- set_export_mode(enabled=True)[source]¶
Set or unset export mode.
- Parameters:
enabled (bool)
- Return type:
None
- classmethod with_args(**kwargs)[source]¶
- Parameters:
kwargs (dict)
- Return type:
PartialConstructor[FakeQuantizeImplBase]
- property granularity: QuantizationGranularity¶
Getter for granularity.