Architecture Overview¶

This page explains the core execution flow and where to plug in or debug. It highlights the minimal sync/async code paths and how streaming, partial, and parallel modes integrate.

High-Level Flow¶

sequenceDiagram
    autonumber
    participant U as User Code
    participant I as Instructor (patched)
    participant R as Retry Layer (tenacity)
    participant C as Provider Client
    participant D as Dispatcher (process_response)
    participant H as Provider Handler (response/reask)
    participant M as Pydantic Model

    U->>I: chat.completions.create(response_model=..., **kwargs)
    Note right of I: patch() wraps create() with cache/templating and retry
    I->>R: retry_sync/async(func=create, max_retries, strict, mode, hooks)
    loop attempts
        R->>C: create(**prepared_kwargs)
        C-->>R: raw response (provider-specific)
        R->>D: process_response(_async)(response, response_model, mode, stream)
        alt Streaming/Partial
            D->>M: Iterable/Partial.from_streaming_response(_async)
            D-->>R: Iterable/Partial model (or list of items)
        else Standard
            D->>H: provider mode handler (format/parse selection)
            H-->>D: adjusted response_model/new_kwargs if needed
            D->>M: response_model.from_response(...)
            M-->>D: parsed model (with _raw_response attached)
            D-->>R: model (or adapted simple type)
        end
        R-->>I: parsed model
    end
    I-->>U: final model (plus _raw_response on instance)

    rect rgb(255,240,240)
    Note over R,H: On validation/JSON errors → reask path
    R->>H: handle_reask_kwargs(..., exception, failed_attempts)
    H-->>R: new kwargs/messages for next attempt
    end

Key responsibilities: - patch(): wraps the provider create with cache lookup/save, templating, strict mode, hooks, and retry. - Retry: executes provider call, emits hooks, updates usage, handles validation/JSON errors with reask, and re-attempts. - Dispatcher: selects the correct parsing path by Mode, handles multimodal message conversion, and attaches _raw_response to the returned model. - Provider Handlers: provider/mode-specific request shaping and reask preparation.

Minimal Code Paths¶

Synchronous¶

import openai
import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_provider("openai/gpt-5-nano")

model = client.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "{'name': 'Ada', 'age': 37}"}],
    response_model=User,            # triggers schema/tool wiring + parsing
    max_retries=3,                  # tenacity-backed validation retries
    strict=True,                    # strict JSON parsing if supported
)

# Access raw provider response if needed
raw = model._raw_response

Asynchronous¶

import asyncio
import openai
import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

async def main():
    aclient = instructor.from_provider("openai/gpt-5-nano", async_client=True)
    model = await aclient.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "{\"name\": \"Ada\", \"age\": 37}"}],
        response_model=User,
        max_retries=3,
        strict=True,
    )
    print(model)

asyncio.run(main())

Streaming, Partial, Parallel¶

Streaming Iterable¶

Use create_iterable(response_model=Model, stream=True implicitly) via Instructor.create_iterable.
Returns a generator (sync) or async generator (async) of parsed items.
Internally sets stream=True, and IterableBase.from_streaming_response(_async) assembles items.

for item in client.create_iterable(messages=..., response_model=MyModel):
    print(item)

Partial Objects¶

Use create_partial(response_model=Model) to receive progressively filled partial models while streaming.
Internally wraps the model as Partial[Model] and sets stream=True.

for partial in client.create_partial(messages=..., response_model=MyModel):
    # partial contains fields as they arrive
    pass

Parallel Tools¶

Use Mode.PARALLEL_TOOLS and a parallel type hint (e.g., list of models) when you need multiple tool calls in one request.
Streaming is not supported in parallel tools mode.

from instructor.mode import Mode

result = client.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract person and event info."}],
    response_model=[PersonInfo, EventInfo],
    mode=Mode.PARALLEL_TOOLS,
)

Hooks and Retry¶

You can observe and instrument the flow with hooks. Typical events: - completion:kwargs: just before provider call - completion:response: after provider call - parse:error: on validation/JSON errors - completion:last_attempt: when a retry sequence is about to stop - completion:error: non-validation completion errors

from instructor.core.hooks import HookName

client.on(HookName.COMPLETION_KWARGS, lambda **kw: print("KWARGS", kw))
client.on(HookName.PARSE_ERROR, lambda e: print("PARSE", e))

Where Multimodal Conversion Happens¶

For modes that require it, messages are converted via processing.multimodal.convert_messages.
Image/Audio/PDF autodetection can be enabled (by specific handlers/modes) and will convert strings/paths/URLs or data URIs into provider-ready payloads.

Error Handling at a Glance¶

Validation or JSON decode errors trigger the reask path.
Reask handlers (handle_reask_kwargs) append/adjust messages with error feedback so the next attempt can correct itself.
If all retries fail, InstructorRetryException is raised containing failed_attempts, the last completion, usage totals, and the create kwargs for reproduction.

Extensibility Notes¶

New providers add utils for response and reask handling and register modes used by the dispatcher.
Most JSON/tool patterns are shared; prefer reusing existing handlers where possible.
Keep provider-specific logic in provider utils; avoid expanding central dispatcher beyond routing and orchestration.