LanguageModelSession#

This page documents the session management classes for handling sessions with the model.

Note

Swift Equivalent: This Python API corresponds to the LanguageModelSession class in the Swift Foundation Models Framework.

LanguageModelSession#

class apple_fm_sdk.LanguageModelSession[source]#

Bases: _ManagedObject

Represents a language model session for foundation model interactions.

A LanguageModelSession manages the lifecycle of a session with a foundation model, maintaining session history (transcript), handling tool calls, and providing both synchronous and streaming response capabilities.

The session is thread-safe for sequential requests but does not support concurrent requests. If a request is in progress, attempting another request will wait for the first to complete.

Session Lifecycle:

  1. Creation: Initialize with optional instructions, model configuration, and tools

  2. Active Use: Make requests via respond() or stream_response()

  3. Cleanup: Automatically handled via context manager or explicit cleanup

Concurrent Request Handling:

Sessions use an internal lock to prevent concurrent requests. If you need to handle multiple requests simultaneously, create multiple session instances.

Examples

Basic session creation and usage:

import apple_fm_sdk as fm

# Create a simple session
session = fm.LanguageModelSession()
response = await session.respond("Hello, how are you?")
print(response)

Session with instructions:

import apple_fm_sdk as fm

# Guide the model's behavior with instructions
session = fm.LanguageModelSession(
    instructions="You are a helpful bird expert. Provide concise, "
                "accurate information about birds."
)
response = await session.respond("What is a Swift?")

Session with custom model and tools:

import apple_fm_sdk as fm
from my_tools import CalculatorTool, WeatherTool

model = fm.SystemLanguageModel(
    temperature=0.7,
    top_p=0.9
)

session = fm.LanguageModelSession(
    instructions="You are a helpful assistant with access to tools.",
    model=model,
    tools=[CalculatorTool(), WeatherTool()]
)

response = await session.respond("What's the weather like in Cupertino?")

See also

__init__(instructions=None, model=None, tools=None)[source]#

Create a language model session.

Parameters:
  • instructions (Optional[str]) – Optional system instructions to guide the model’s behavior throughout the session. These instructions persist across all requests in the session. Example: “You are a helpful coding assistant.”

  • model (Optional[SystemLanguageModel]) – Optional specialized system model configuration. If not provided, uses default SystemLanguageModel() with standard settings. Use this to customize temperature, top_p, and other generation parameters.

  • tools (Optional[list[Tool]]) – Optional list of Tool instances that the model can invoke during generation. Tools enable the model to perform actions like calculations, API calls, or database queries. The model will automatically decide when to use tools based on the session context.

Raises:

FoundationModelsError – If session creation fails

Note

The session maintains a transcript of all interactions, which can be accessed via the transcript property. This transcript is automatically updated after each request.

property is_responding: bool#

Check if the session is currently responding to a request.

Returns:

True if the session is currently processing a request, False otherwise

Return type:

bool

async respond(prompt: str) str[source]#
async respond(prompt: str, *, generating: type[Generable]) Type[Any]
async respond(prompt: str, *, generating: Generable) Type[Any]
async respond(prompt: str, *, schema: GenerationSchema) GeneratedContent
async respond(prompt: str, *, json_schema: dict) GeneratedContent

Get a response to a prompt with optional guided generation.

This function supports multiple response modes:

  1. Basic text response: Returns a plain string

  2. Guided generation with Generable: Returns a typed Python object

  3. Guided generation with schema: Returns structured GeneratedContent

  4. Guided generation with JSON schema: Returns structured GeneratedContent

The session automatically updates its transcript after each response, maintaining the full session history.

Parameters:
  • prompt (str) – The input prompt string to send to the model

  • generating (Optional[Union[Type[Generable], Generable]]) – Optional Generable type or instance for type-safe guided generation. When provided, the response will be constrained to match the structure of the Generable type and automatically converted to an instance of that type.

  • schema (Optional[GenerationSchema]) – Optional GenerationSchema for explicit schema-based guided generation. Use this for custom schemas that don’t map to a Generable type.

  • json_schema (Optional[dict]) – Optional JSON schema dictionary for guided generation. The schema should follow JSON Schema specification.

Returns:

Plain text response if no generation constraints are specified, or instance of generating type if generating parameter is provided, or structured content if schema or json_schema is provided

Return type:

Union[str, Any, GeneratedContent]

Raises:

Examples

Basic text response::

import apple_fm_sdk as fm session = fm.LanguageModelSession() response = await session.respond(“What is the capital of France?”) print(response) # Plain string response

Guided generation with Generable type:

import apple_fm_sdk as fm

@fm.generable()
class Cat:
    name: str
    age: int
    profile: str

session = fm.LanguageModelSession()
cat = await session.respond(
    "Generate a cat named Maomao",
    generating=Cat
)
print(f"{cat.name} is {cat.age} years old")

Multi-turn session:

import apple_fm_sdk as fm

session = fm.LanguageModelSession(
    instructions="You are a helpful expert on architecture."
)

# First turn
response1 = await session.respond("What is the tallest building in the world?")
print(response1)

# Second turn - context is maintained
response2 = await session.respond(
    "What's the architectural style of that building?"
)
print(response2)

Note

  • Only one of generating, schema, or json_schema can be specified

  • The session maintains session context across multiple respond() calls

  • Concurrent calls to respond() on the same session will be serialized

  • For streaming responses, use stream_response() instead

See also

async stream_response(prompt)[source]#

Stream response chunks for a prompt (text only).

This function provides real-time streaming of the model’s response, yielding text snapshots as they become available. Each yielded value represents the complete response text generated so far, rather than the delta from the previous chunk.

Streaming Behavior:

  • Yields complete text snapshots (not deltas) as generation progresses

  • The final yield contains the complete response

  • Automatically updates the session transcript after completion

  • Does not support guided generation (text responses only)

  • Can be cancelled mid-stream using asyncio cancellation

Parameters:

prompt (str) – The input prompt string to send to the model

Yields:

Progressive snapshots of the response text. Each snapshot contains the full text generated so far, rather than only the new tokens.

Ytype:

str

Raises:

Examples

Basic streaming:

import apple_fm_sdk as fm

session = fm.LanguageModelSession()

async for chunk in session.stream_response("Tell me a story"):
    print(chunk, end="", flush=True)

Cancelling a stream:

import asyncio
import apple_fm_sdk as fm

session = fm.LanguageModelSession()

async def stream_with_timeout():
    try:
        async for chunk in session.stream_response("Write a long essay"):
            print(chunk)
            # Simulate some processing
            await asyncio.sleep(0.1)
    except asyncio.CancelledError:
        print("Stream cancelled")
        raise

# Cancel after 5 seconds
task = asyncio.create_task(stream_with_timeout())
await asyncio.sleep(5)
task.cancel()
Streaming with error handling::

import apple_fm_sdk as fm session = fm.LanguageModelSession()

try:
async for chunk in session.stream_response(“Hello”):

print(chunk)

except fm.FoundationModelsError as e:

print(f”Streaming error: {e}”)

Note

  • Streaming currently only supports basic text responses

  • For guided generation, use respond() instead

  • Each snapshot contains the full text, rather than only new tokens

  • The session transcript is updated only after streaming completes

  • Breaking out of the async for loop early will properly clean up resources

See also

  • respond(): For non-streaming responses with guided generation support