Integration with Core AI¶

coreai-opt is built to optimize PyTorch models in a way that lets them be easily converted into the Core AI model format. Most of the compression configurations it exposes map directly to compression modes that the Core AI runtime executes efficiently on Apple Silicon, with a smaller set kept available for general experimentation.

Core AI deployment uses ExportBackend.CoreAI with finalize(), followed by conversion with coreai-torch:

import coreai_opt as opt

# After prepare() / calibration / training, finalize for Core AI export
# "backend" defaults to opt.ExportBackend.CoreAI
finalized_model = quantizer.finalize(backend=opt.ExportBackend.CoreAI)

Under the hood, finalize() replaces coreai-opt’s internal fake-quantize/fake-palettize ops with PyTorch custom ops whose definitions match the corresponding compression ops in the Core AI dialect. This allows coreai-torch to recognize the ops and map them correctly in the Core AI representation.

From here, conversion follows the same steps as for any uncompressed model conversion:

from pathlib import Path

from coreai_opt.casting import cast_to_16_bit_precision
import coreai_torch
import torch

# torch.export the finalized model with coreai-torch's decomposition table.
exported_program = torch.export.export(
    finalized_model, example_inputs
).run_decompositions(coreai_torch.get_decomp_table())

# Cast remaining FP32 weights/activations to FP16 (and INT32/INT64 to INT16) so
# the model runs more efficiently.
cast_to_16_bit_precision(exported_program)

# Convert to a Core AI program and save it as an .aimodel.
converter = coreai_torch.TorchConverter()
converter.add_exported_program(exported_program)
ai_program = converter.to_coreai()
ai_program.optimize()
ai_program.save_asset(Path("model.aimodel"))

The cast_to_16_bit_precision step is recommended as part of the default Core AI workflow: it rewrites the exported graph so FP32 tensors are cast to FP16, for faster inference. See 16-bit PyTorch Model Casting for details.

For more on coreai-torch itself — supported ops, the decomposition table, and the AIProgram API — see the coreai-torch documentation.

For compatibility purposes, setting backend in finalize() to ExportBackend.CoreML produces a torch model that can be converted with coremltools and deployed with Core ML. Not all coreai-opt compression configurations are supported by Core ML — for example, FP4/FP8 weight/activation quantization are Core AI-only and will not convert with coremltools.