Running an .aimodel with coreai.runtime¶
This notebook loads an .aimodel asset and runs inference on it. We will use
four public types:
AIModelAsset— the on-disk representation of a saved program (fromcoreai.authoring).InferenceFunction— a callable function inside the model (here,main).NDArray— the runtime’s multi-dimensional array type, used to pass inputs in and read outputs back.SpecializationOptions— for advanced device/configuration tuning (covered briefly at the end).
Inside the async with block you will also see an AIModel — the runnable
handle yielded by the asset’s executable() context manager. It is the type
you call load_function on, but you do not import or construct it directly.
We assume you have already produced a hello.aimodel from the previous
tutorial (Constructing a CoreAI Graph). The setup cell below recreates it
if it is missing, so this notebook is also safe to run standalone.
Setup¶
Import the asset and runtime types. Inference is asynchronous, so we will
use await and async with directly inside cells (Jupyter runs each cell
in an event loop).
from pathlib import Path
import numpy as np
from coreai.authoring import AIModelAsset
from coreai.runtime import InferenceFunction, NDArray
asset_path = Path("./hello-run.aimodel")
Ensure hello.aimodel exists¶
If the asset from the previous tutorial is not on disk, build a fresh copy now. The build path mirrors the construction shown in Constructing a CoreAI Graph. We’re going to go ahead and recreate the asset path.
from shutil import rmtree
from typing import Annotated
from coreai.authoring import AIProgram, Module, TensorSpec
# Pending re-export from coreai.authoring; see the previous tutorial.
from coreai._compiler.dialects import coreai as ops
from coreai._compiler.ir import Value
# Reconstruct asset.
if asset_path.exists():
rmtree(asset_path)
module = Module.create()
with module:
@ops.graph
def main(
x: Annotated[Value, TensorSpec(shape=[2, 3], dtype=np.float32)],
) -> Annotated[Value, TensorSpec(shape=[2, 3], dtype=np.float32, name="y")]:
return ops.add(x, x)
AIProgram(module).save_asset(asset_path)
print(f"created {asset_path}")
Open the asset¶
AIModelAsset.load reads the .aimodel directory header from disk so you
can inspect it; it does not yet compile the program for inference. That work
happens lazily inside the executable() async context manager used in the
next section.
Note
Two ways to load a model. This tutorial uses AIModelAsset.load(path)
followed by async with asset.executable() as model: — the resource-managed
form, which gives you explicit control over when the compiled model is
released. There is also a one-shot await AIModel.load(path) that returns
a runnable AIModel directly; reach for it when you want a long-lived
model handle without the async with block (e.g. inside an application
object that owns the model for its full lifetime).
asset = AIModelAsset.load(asset_path)
Open the model and run inference¶
asset.executable() returns an async context manager that yields an
AIModel — a runnable handle to the compiled program. Resources are released
when the async with block exits, so all model usage happens inside it.
Inside the block we do five things in order:
list the functions exposed by the model,
look up
mainand inspect itsdesc(name, input names, output names),build the input dict — each name from the signature maps to an
NDArray(which accepts a NumPy array, a PyTorch tensor, or a Python list, wrapping the data without a copy where possible),awaitthe call to get back adictkeyed by output name, andcopy the result out as NumPy so we can inspect it after exiting the context.
async with asset.executable() as model:
print(f"functions: {model.function_names}")
function: InferenceFunction = model.load_function("main")
desc = function.desc
print(f"name: {desc.name}")
print(f"inputs: {desc.input_names}")
print(f"outputs: {desc.output_names}")
x = np.full((2, 3), 1.5, dtype=np.float32)
inputs = {"x": NDArray(x)}
print(f"input x:\n{x}")
outputs = await function(inputs)
print(f"output keys: {list(outputs.keys())}")
# Materialize the result inside the block — the model's backing buffers
# are only guaranteed valid until the context exits.
result = outputs["y"].numpy()
Inspect the output¶
.numpy() was called inside the async with block to ensure the result was
materialized before the model’s buffers could be released. We can now inspect
the array freely. Our main function computes x + x, so for an input of
all 1.5s we expect all 3.0s back.
print(f"shape: {result.shape}")
print(f"dtype: {result.dtype}")
print(f"value:\n{result}")
assert result.shape == (2, 3)
assert result.dtype == np.float32
print("OK — inference produced expected output shape and dtype")
What’s next¶
You now have the core loop: load an asset, enter its executable, build inputs, await the call, and read the outputs back as NumPy.
For more advanced configuration, two types in coreai.runtime are worth
knowing about:
SpecializationOptions— pass toasset.executable(options)to pin the preferred compute unit (CPU / GPU / Neural Engine) or enable debug mode. (macOS only.)StorageKind— passed toNDArray(data, backing=...)to choose byte-backed, IOSurface-backed, or Metal-backed storage. The default (StorageKind.BYTES) is what you want unless you are interoperating with graphics or camera buffers.
Both are importable from coreai.runtime, e.g.:
from coreai.runtime import SpecializationOptions, StorageKind