API Reference¶
coreai_opt¶
coreai_opt - A library for PyTorch model compression and optimizations.
|
Raised when a model cannot be exported to the CoreML backend. |
|
Enum representing supported model export backends. |
coreai_opt.casting¶
Casting related utilities including FP32 -> FP16 and INT32 -> INT16 passes.
Convert a torch exported program from FP32 to FP16 where applicable. |
|
Convert INT32/INT64 tensors to INT16 in a torch exported program. |
|
Convert a torch exported program to 16-bit precision: FP32→FP16 and INT32/64→INT16. |
coreai_opt.config¶
Configuration and specification modules for coreai_opt.
Top level configuration class for model compression. |
|
Base class for compression specifications. |
|
|
Enum representing compression techniques applied to the model. |
Abstract base configuration class for module-level compression settings. |
|
Abstract base configuration class for op-level compression settings. |
|
Mixin that adds weight-only validation to ModuleCompressionConfig subclasses. |
|
Mixin that adds weight-only validation to OpCompressionConfig subclasses. |
coreai_opt.config.spec¶
Base abstractions for compression specs, simulators, and component factories.
Abstract base class for compression component factories. |
|
Abstract base class for compression simulators. |
|
Enum to specify the target tensor for compression. |
coreai_opt.coreai_utils¶
Core AI MLIR-level compression transforms.
Enum representing the granularity of quantization for Core AI weight compression. |
|
|
Enum representing data types for Core AI weight compression. |
Palettize weights in a Core AI AIProgram (MLIR<CoreAI> IR) by using Core AI ops. |
|
Quantize weights in a Core AI AIProgram (MLIR<CoreAI> IR) by using Core AI ops. |
|
Sparsify weights in a Core AI AIProgram (MLIR<CoreAI> IR) by using Core AI ops. |
coreai_opt.coreai_utils.common¶
Common enums and constants for coreai_opt.coreai_utils.
Enum representing the quantization scheme. |
coreai_opt.inspection¶
Utilities for inspecting model operations and compression configuration.
|
Inspect operations in a PyTorch model for compression configuration. |
|
Complete listing of operations discovered in a model. |
One level of the |
|
A node in the |
|
|
Information about a single operation discovered in a model. |
|
A single frame in the source call stack leading to an operation. |
coreai_opt.palettization¶
Palettization specification and utilities for weight compression via lookup tables.
K-means palettizer with integrated supported operations strategy. |
|
Top-level configuration class for kmeans palettization. |
|
Configuration for palettizing a specific module using K-means clustering. |
|
Specification for palettization compression of neural network weights. |
coreai_opt.palettization.config¶
Palettization configuration classes.
Configuration class for palettization at the operation level. |
coreai_opt.palettization.spec¶
Palettization specs, granularity classes, and factory functions.
Base class for palettization granularity specifications. |
|
Per-grouped-channel palettization granularity. |
|
Per-tensor palettization granularity. |
|
|
coreai_opt.pruning¶
Pruning infrastructure for coreai_opt.
|
Apply magnitude-based pruning to a model. |
Top-level configuration for magnitude pruning. |
|
Module-level pruning configuration. |
|
Specification for pruning tensors. |
coreai_opt.pruning.config¶
Pruning configuration exports.
Step function: zero before |
|
Operation-level pruning configuration. |
|
Polynomial schedule from |
|
Abstract base for sparsity schedules used by |
coreai_opt.pruning.spec¶
Pruning spec components: specs, schemes, and parametrizations.
Channel-structured pruning scheme. |
|
Abstract base for pruning parametrizations that mask a layer's weight. |
|
Base class for pruning scheme specifications. |
|
Unstructured pruning scheme. |
|
Return the default pruning spec for weight tensors. |
coreai_opt.quantization¶
Quantization compressor, configuration, specs, and granularity classes.
|
Enum representing quantization execution modes. |
Configuration class for quantization at the module level. |
|
Specification for quantizing tensors in neural networks. |
|
|
Unified quantizer API that provides a single entry point for various quantization workflows, including: |
Top-level configuration class for quantization. |
coreai_opt.quantization.config¶
Quantization configuration classes and execution mode.
Configuration class for quantization at the operation level. |
|
Schedule for controlling observer and fake quantization state in QAT. |
coreai_opt.quantization.spec¶
Quantization specs, schemes, granularity classes, and parameter calculators.
|
Computes scale and zero point by tracking the running min/max. |
Range calculator that computes the range of a given tensor as the min and max values of the tensor. |
|
|
Computes the scale and zero point using a moving average of the range. |
Per-block quantization granularity. |
|
Per-channel quantization granularity. |
|
Per-tensor quantization granularity. |
|
Base class for implementing logic to calculate quantization parameters (scale, zero_point, minval) given min/max values. |
|
Factory class for creating quantization components from QuantizationSpec. |
|
Formula used to map between quantized integers and dequantized values. |
|
Base class for quantization granularity specifications. |
|
Base class and registry for classes used to compute the range of a given tensor. |
|
Mixin for calculators that maintain running min/max range buffers. |
|
Computes scale and zero point using min/max values from the current tensor. |
|
|
|
|
coreai_opt.quantization.spec.fake_quantize¶
Fake quantization implementation base class and default implementation.
|
Base class for implementing fake quantization |