coreai_opt.quantization.QuantizerConfig¶

class coreai_opt.quantization.QuantizerConfig[source]¶

Bases: CompressionConfig[ModuleQuantizerConfig]

Top-level configuration class for quantization.

This class manages the complete quantization configuration for a neural network model, organizing module-level configurations in a hierarchical structure. It inherits from CompressionConfig and specializes it for quantization using ModuleQuantizerConfig.

The configuration lookup follows a hierarchical precedence (most to least specific):

module_name_configs - Applies to module instances matching a name pattern (supports regex)
module_type_configs - Applies to all modules of a specific type (e.g., torch.nn.modules.linear.Linear)
global_config - Default configuration applied to all modules not otherwise configured

global_config¶

Default module-level quantization configuration applied to all modules that don’t have a more specific configuration. When QuantizerConfig is initialized with no arguments, a default global_config is automatically created with standard int8 quantization. Setting global_config to None disables quantization by default globally. Default: Auto-created with int8 quantization specs when no args provided

Type:: ModuleQuantizerConfig | None

module_type_configs¶

Module type-specific configurations. Keys are fully-qualified module type names (e.g., “torch.nn.modules.linear.Linear”, “torch.nn.modules.conv.Conv2d”). Values are ModuleQuantizerConfig objects or None to disable quantization for that module type. Default: {} (empty dict, no type-specific configs)

Type:: dict[str, ModuleQuantizerConfig | None] | None

module_name_configs¶

Module name-specific configurations. Keys are module name patterns (supports regex matching, e.g., “model.layer1.*”, “decoder.layers.0”). Values are ModuleQuantizerConfig objects or None to disable quantization for matching modules. Default: {} (empty dict, no name-specific configs)

Type:: dict[str, ModuleQuantizerConfig | None] | None

preserved_attributes¶

Names of attributes of the model which should be preserved on the prepared and finalized models, even if they are not used in the model’s forward pass.

Type:: list[str] | None

execution_mode¶

Specifies which quantization execution mode to use. Options are:

ExecutionMode.GRAPH / “graph”:
Graph-based quantization using torch.export and FX graphs, built on torchao’s PT2E implementation. Requires the model to be exportable.
ExecutionMode.EAGER / “eager”:
Works directly on nn.Module without converting to a graph representation. Supports dynamic control flow (if/else, loops) and doesn’t require torch.export.

Default: ExecutionMode.GRAPH

Type:: ExecutionMode | str

kv_cache_quant_configs¶

Optional mapping from short op-type name (as returned by get_node_type, to the cache-update op’s KVCacheQuantConfig. Each entry enables storing the corresponding KV-cache buffer in a quantized dtype: it carries the op’s OpQuantizerConfig inline and triggers a finalize-side rewrite that relocates the dequantize from the op’s input to its output. Graph mode only; rejected for eager mode by _validate_kv_cache_quant_configs(). See KVCacheQuantConfig for details. Default: None (no KV-cache buffer quantization)

Type:: dict[str, KVCacheQuantConfig] | None

Example

>>> # Create default quantizer config (auto-creates int8 global
>>> # config)
>>> config = QuantizerConfig()
>>> # config.global_config is automatically created with default int8 specs
>>>
>>> # Disable quantization globally
>>> config = QuantizerConfig(
...     global_config=None
... )
>>>
>>> # Create custom quantizer config with type-specific settings for Linear
>>> # modules.
>>> config = QuantizerConfig(
...     # Omitted global_config section defaults to int8/int8 weight/activation
...     # quantization for all operations
...     module_type_configs={
...         "torch.nn.modules.linear.Linear": ModuleQuantizerConfig(
...             op_input_spec={
...                 0: ...
...             },
...             op_output_spec={
...                 0: ...
...             },
...             op_state_spec={
...                 'weight': ...
...             }
...         )
...     },
... )
>>>
>>> # Load quantizer config from YAML file
>>> config = QuantizerConfig.from_yaml("config.yaml")

Notes

When initialized with no arguments, a default configuration is created with int8 symmetric quantization for activations and weights
The from_yaml class method provides an alternative way to create configurations from YAML files
Setting a config to None explicitly disables quantization for that scope
More specific configurations (name > type > global) always override less specific ones

set_execution_mode(mode)[source]¶

Set the quantization execution mode.

Parameters:: mode (ExecutionMode | str) – Execution mode to use. Accepts an ExecutionMode member (e.g. ExecutionMode.EAGER) or its string value (e.g. "graph", "eager").
Returns:: This config, for method chaining.
Return type:: Self
Raises:: ValueError – If mode is a string that is not a valid ExecutionMode value.

Example

>>> config = QuantizerConfig.presets.w4()
>>> config.set_execution_mode(ExecutionMode.EAGER)