coreai_opt.quantization.QuantizerConfig¶
- class coreai_opt.quantization.QuantizerConfig[source]¶
Bases:
CompressionConfig[ModuleQuantizerConfig]Top-level configuration class for quantization.
This class manages the complete quantization configuration for a neural network model, organizing module-level configurations in a hierarchical structure. It inherits from CompressionConfig and specializes it for quantization using ModuleQuantizerConfig.
The configuration lookup follows a hierarchical precedence (most to least specific):
module_name_configs - Applies to module instances matching a name pattern (supports regex)
module_type_configs - Applies to all modules of a specific type (e.g., torch.nn.modules.linear.Linear)
global_config - Default configuration applied to all modules not otherwise configured
- global_config¶
Default module-level quantization configuration applied to all modules that don’t have a more specific configuration. When QuantizerConfig is initialized with no arguments, a default global_config is automatically created with standard int8 quantization. Setting global_config to None disables quantization by default globally. Default: Auto-created with int8 quantization specs when no args provided
- Type:
ModuleQuantizerConfig | None
- module_type_configs¶
Module type-specific configurations. Keys are fully-qualified module type names (e.g., “torch.nn.modules.linear.Linear”, “torch.nn.modules.conv.Conv2d”). Values are ModuleQuantizerConfig objects or None to disable quantization for that module type. Default: {} (empty dict, no type-specific configs)
- Type:
dict[str, ModuleQuantizerConfig | None] | None
- module_name_configs¶
Module name-specific configurations. Keys are module name patterns (supports regex matching, e.g., “model.layer1.*”, “decoder.layers.0”). Values are ModuleQuantizerConfig objects or None to disable quantization for matching modules. Default: {} (empty dict, no name-specific configs)
- Type:
dict[str, ModuleQuantizerConfig | None] | None
- preserved_attributes¶
Names of attributes of the model which should be preserved on the prepared and finalized models, even if they are not used in the model’s forward pass.
- Type:
list[str] | None
- execution_mode¶
Specifies which quantization execution mode to use. Options are:
- ExecutionMode.GRAPH / “graph”:
Graph-based quantization using
torch.exportand FX graphs, built ontorchao’s PT2E implementation. Requires the model to be exportable.
- ExecutionMode.EAGER / “eager”:
Works directly on
nn.Modulewithout converting to a graph representation. Supports dynamic control flow (if/else, loops) and doesn’t requiretorch.export.
Default: ExecutionMode.GRAPH
- Type:
ExecutionMode | str
Example
>>> # Create default quantizer config (auto-creates int8 global >>> # config) >>> config = QuantizerConfig() >>> # config.global_config is automatically created with default int8 specs >>> >>> # Disable quantization globally >>> config = QuantizerConfig( ... global_config=None ... ) >>> >>> # Create custom quantizer config with type-specific settings for Linear >>> # modules. >>> config = QuantizerConfig( ... # Omitted global_config section defaults to int8/int8 weight/activation ... # quantization for all operations ... module_type_configs={ ... "torch.nn.modules.linear.Linear": ModuleQuantizerConfig( ... op_input_spec={ ... 0: ... ... }, ... op_output_spec={ ... 0: ... ... }, ... op_state_spec={ ... 'weight': ... ... } ... ) ... }, ... ) >>> >>> # Load quantizer config from YAML file >>> config = QuantizerConfig.from_yaml("config.yaml")
Notes
When initialized with no arguments, a default configuration is created with int8 symmetric quantization for activations and weights
The from_yaml class method provides an alternative way to create configurations from YAML files
Setting a config to None explicitly disables quantization for that scope
More specific configurations (name > type > global) always override less specific ones
- set_execution_mode(mode)[source]¶
Set the quantization execution mode.
- Parameters:
mode (ExecutionMode | str) – Execution mode to use. Accepts an
ExecutionModemember (e.g.ExecutionMode.EAGER) or its string value (e.g."graph","eager").- Returns:
This config, for method chaining.
- Return type:
Self
- Raises:
ValueError – If
modeis a string that is not a validExecutionModevalue.
Example
>>> config = QuantizerConfig.presets.w4() >>> config.set_execution_mode(ExecutionMode.EAGER)