Externalization¶

Externalization preserves a submodule’s operation boundary during conversion, so the operation stays intact as a recognizable unit in the converted model. When you mark a well-known building block — such as attention, RoPE, or RMSNorm — as a composite op, the compiler recognizes that operation and can apply an implementation optimized for it, producing a faster model.

Use ExternalizeSpec to tag the extracted submodule with a composite op name and the attributes that define it, so the compiler can recognize and optimize the operation.

What externalization produces¶

Both modes extract the matching submodule into its own coreai.graph that @main calls with coreai.invoke. The difference is what that graph carries. Take a model whose forward is linear(norm(x)), with the norm submodule externalized:

Composite op externalization marks the extracted graph as a named composite op — note the private graph and the composite_decl attribute recording the op name and its attributes, which the compiler recognizes and optimizes:

module {
  coreai.graph private noinline @norm.rms_norm(
      %arg0: tensor<1x10xf32> {coreai.name = "input"},
      %arg1: tensor<10xf32> {coreai.name = "scale"}
  ) -> tensor<1x10xf32> attributes {
      composite_decl = #coreai.composite_declaration<"rms_norm" = {
          input_names = ["input", "scale"],
          op_attrs = {axes = -1 : si64, eps = 9.99999974E-6 : f32, version = 1 : si64},
          output_names = ["output"]}>
  } {
    // ... rms-norm body ...
    coreai.output %15 : tensor<1x10xf32>
  }
  coreai.graph @main(%arg0: tensor<1x10xf32>) -> tensor<1x5xf32> {
    %3 = coreai.invoke @norm.rms_norm(%arg0, %0)
        : (tensor<1x10xf32>, tensor<10xf32>) -> tensor<1x10xf32>
    // ... linear ...
    coreai.output %7 : tensor<1x5xf32>
  }
}

Simple externalization extracts the same submodule as a plain graph boundary — no composite_decl, so the compiler sees an opaque subgraph rather than a named op:

module {
  coreai.graph noinline @norm(%arg0: tensor<1x10xf32>) -> tensor<1x10xf32> {
    // ... rms-norm body ...
    coreai.output %16 : tensor<1x10xf32>
  }
  coreai.graph @main(%arg0: tensor<1x10xf32>) -> tensor<1x5xf32> {
    %3 = coreai.invoke @norm(%arg0)
        : (tensor<1x10xf32>) -> tensor<1x10xf32>
    // ... linear ...
    coreai.output %7 : tensor<1x5xf32>
  }
}

Symbol names and constants above are illustrative (the converter appends a hash suffix to each externalized graph name).

Composite Op Externalization¶

Use ExternalizeSpec to mark a submodule as a named composite op:

import torch
import torch.nn as nn

from coreai_torch import ExternalizeSpec


class RMSNormComposite(nn.Module):
    def __init__(self, axes=-1, eps=1e-5, version=1):
        super().__init__()
        self.axes = axes
        self.eps = eps
        self.version = version

    def forward(self, input: torch.Tensor, scale: torch.Tensor) -> torch.Tensor:
        x_f32 = input.to(torch.float32)
        inv_rms = torch.rsqrt((x_f32 * x_f32).mean(self.axes, keepdim=True) + self.eps)
        return (input * inv_rms).to(input.dtype) * scale

model = RMSNormComposite().eval()
sample = (torch.randn(10), torch.randn(10))

import torch

import coreai_torch
from coreai_torch import ExternalizeSpec, TorchConverter

converter = TorchConverter().add_pytorch_module(
    model,
    export_fn=lambda m: torch.export.export(m, args=sample).run_decompositions(
        coreai_torch.get_decomp_table()
    ),
    externalize_modules=[
        ExternalizeSpec(
            target_class=RMSNormComposite,
            composite_op_name="rms_norm",
            composite_attrs=["axes", "eps", "version"],
        )
    ],
)
coreai_program = converter.to_coreai()
coreai_program.optimize()

Each instance is externalized using its own attribute values, so two instances configured differently are preserved as distinct composite ops.

Input and output names¶

input_names and output_names for each composite op are derived automatically from the module’s parameters, buffers, and forward() signature — no manual bookkeeping is needed. Optional arguments passed as None at a call site are excluded automatically.

Requirements for composite op modules¶

Composite op forward methods must follow two rules:

Forward arguments must be tensors — all forward parameters that become inputs must be torch.Tensor. Scalar configuration (e.g., eps, is_causal) should be stored as instance attributes and serialized via composite_attrs in ExternalizeSpec.
Optional arguments must use torch.Tensor | None = None — when an optional is not provided (left as None), it is excluded entirely and does not appear in input_names. There is no support for default tensor values.

Simple externalization (experimental)¶

Passing a bare module class to externalize_modules — instead of an ExternalizeSpec — extracts the submodule into its own standalone graph with no composite-op metadata. It offers no optimization benefit and simply defines a boundary around the submodule.

Warning

Simple externalization is experimental. Prefer composite-op externalization above.

API Quick Reference¶

ExternalizeSpec¶

Field	Type	Description
`target_class`	`type`	The `nn.Module` subclass to match. Every instance found in the model will be externalized.
`composite_op_name`	`str \| None`	If set, the submodule is preserved as a named composite op the compiler can recognize and optimize.
`composite_attrs`	`list[str] \| None`	Instance attribute names to record as attributes of the composite op.

Set composite_op_name, and optionally composite_attrs, to preserve the submodule as a composite op the compiler can optimize.

Relevant `add_pytorch_module` parameters¶

Parameter	Type	Description
`externalize_modules`	`list[ExternalizeSpec] \| None`	The `ExternalizeSpec` objects describing which submodule classes to externalize as composite ops.

See TorchConverter API reference for the full method signature.

Next Steps¶

Composite Ops Guide — the built-in composite op modules you can pass to externalize_modules.
Conversion Workflows — add_pytorch_module workflow, which is required for externalization.
ExternalizeSpec — full API reference for ExternalizeSpec.

Notices¶

PyTorch is a trademark of Meta Platforms, Inc.

Externalization¶

What externalization produces¶

Composite Op Externalization¶

Input and output names¶

Requirements for composite op modules¶

Simple externalization (experimental)¶

API Quick Reference¶

ExternalizeSpec¶

Relevant add_pytorch_module parameters¶

Next Steps¶

Notices¶

Relevant `add_pytorch_module` parameters¶