Core AI PyTorch Extensions (coreai-torch)

Bring PyTorch models to Core AI for on-device execution.

Overview

Core AI PyTorch Extensions (coreai-torch) is a Python package that bridges PyTorch and Core AI. You can use it to bring up an existing PyTorch model — exported as a torch.export.ExportedProgram — into a Core AI AIProgram ready to run on Apple hardware, traversing the FX graph node-by-node and mapping ATen operators to Core AI operations. You can equally use it to author Core AI models directly from PyTorch by composing the library of composite ops in coreai_torch.composite_ops, authoring new ops via register_torch_lowering, and authoring inline Metal GPU kernels through TorchMetalKernel and register_custom_kernels — all expressed as PyTorch nn.Modules and lowered to Core AI IR that the compiler recognizes and optimizes natively.

The bring-up pipeline has three steps. First, export your PyTorch model with torch.export.export to capture the computation graph. Second, decompose the exported program with get_decomp_table(), which lowers composite ATen ops to the primitive set that TorchConverter can map while preserving the operations that TorchConverter lowers as composite ops. Third, call TorchConverter().add_exported_program(ep).to_coreai() to produce the AIProgram.

For authoring, coreai_torch.composite_ops exposes well-known building blocks — such as attention, RoPE embeddings, RMSNorm, and gather-matmul (the MoE primitive) — as PyTorch modules. Passing these modules to externalize_modules preserves each one’s operation boundary as a named composite op that the compiler can recognize and optimize. When a PyTorch op has no built-in lowering rule, register a custom lowering function with register_torch_lowering. For compute-intensive custom operations, register_custom_kernels lets you author Metal kernel source and wire it into the conversion pipeline.

Quick example

import torch
from coreai_torch import TorchConverter, get_decomp_table

model = MyModel().eval()
ep = torch.export.export(model, args=(torch.randn(1, 10),))
ep = ep.run_decompositions(get_decomp_table())
coreai_program = TorchConverter().add_exported_program(ep).to_coreai()
coreai_program.optimize()

Choosing your workflow

Starting point

Recommended approach

Already have a decomposed ExportedProgram

TorchConverter().add_exported_program(ep).to_coreai()

Have an nn.Module, no externalization

Either add_exported_program or add_pytorch_module

Have an nn.Module, need externalization

add_pytorch_module(model, ..., externalize_modules=[...])

Externalization lets the Core AI compiler optimize submodules independently or hand them off to specialized backends. See Conversion Workflows for detailed code and a decision guide.

Next steps

Notices

PyTorch is a trademark of Meta Platforms, Inc.