Skip to content

Subscribe to our Newsletter for Updates and Tips

Announcing instructor=1.0.0

Over the past 10 months, we've build up instructor with the principle of 'easy to try, and easy to delete'. We accomplished this by patching the openai client with the instructor package and adding new arguments like response_model, max_retries, and validation_context. As a result I truly believe isntructor is the best way to get structured data out of llm apis.

But as a result, we've been a bit stuck on getting typing to work well while giving you more control at development time. I'm excited to launch version 1.0.0 which cleans up the api w.r.t. typing without compromising the ease of use.

Matching Language in Multilingual Summarization Tasks

When asking language models to summarize text, there's a risk that the generated summary ends up in English, even if the source text is in another language. This is likely due to the instructions being provided in English, biasing the model towards English output.

In this post, we explore techniques to ensure the language of the generated summary matches the language of the source text. We leverage Pydantic for data validation and the langdetect library for language identification.

Structured Outputs with Anthropic

A special shoutout to Shreya for her contributions to the anthropic support. As of now, all features are operational with the exception of streaming support.

For those eager to experiment, simply patch the client with ANTHROPIC_JSON, which will enable you to leverage the anthropic client for making requests.

pip install instructor[anthropic]

Missing Features

Just want to acknowledge that we know that we are missing partial streaming and some better re-asking support for XML. We are working on it and will have it soon.

from pydantic import BaseModel
from typing import List
import anthropic
import instructor

# Patching the Anthropics client with the instructor for enhanced capabilities
anthropic_client = instructor.from_openai(
    create=anthropic.Anthropic().messages.create,
    mode=instructor.Mode.ANTHROPIC_JSON
)

class Properties(BaseModel):
    name: str
    value: str

class User(BaseModel):
    name: str
    age: int
    properties: List[Properties]

user_response = anthropic_client(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    max_retries=0,
    messages=[
        {
            "role": "user",
            "content": "Create a user for a model with a name, age, and properties.",
        }
    ],
    response_model=User,
)  # type: ignore

print(user_response.model_dump_json(indent=2))
"""
{
    "name": "John",
    "age": 25,
    "properties": [
        {
            "key": "favorite_color",
            "value": "blue"
        }
    ]
}

We're encountering challenges with deeply nested types and eagerly invite the community to test, provide feedback, and suggest necessary improvements as we enhance the anthropic client's support.

Simple Synthetic Data Generation

What that people have been using instructor for is to generate synthetic data rather than extracting data itself. We can even use the J-Schemo extra fields to give specific examples to control how we generate data.

Consider the example below. We'll likely generate very simple names.

from typing import Iterable
from pydantic import BaseModel
import instructor
from openai import OpenAI


# Define the UserDetail model
class UserDetail(BaseModel):
    name: str
    age: int


# Patch the OpenAI client to enable the response_model functionality
client = instructor.from_openai(OpenAI())


def generate_fake_users(count: int) -> Iterable[UserDetail]:
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=Iterable[UserDetail],
        messages=[
            {"role": "user", "content": f"Generate a {count} synthetic users"},
        ],
    )


for user in generate_fake_users(5):
    print(user)
    #> name='Alice' age=25
    #> name='Bob' age=30
    #> name='Charlie' age=35
    #> name='David' age=40
    #> name='Eve' age=22

Leveraging Simple Examples

We might want to set examples as part of the prompt by leveraging Pydantics configuration. We can set examples directly in the JSON scheme itself.

from typing import Iterable
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI


# Define the UserDetail model
class UserDetail(BaseModel):
    name: str = Field(examples=["Timothee Chalamet", "Zendaya"])
    age: int


# Patch the OpenAI client to enable the response_model functionality
client = instructor.from_openai(OpenAI())


def generate_fake_users(count: int) -> Iterable[UserDetail]:
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=Iterable[UserDetail],
        messages=[
            {"role": "user", "content": f"Generate a {count} synthetic users"},
        ],
    )


for user in generate_fake_users(5):
    print(user)
    #> name='John Doe' age=25
    #> name='Jane Smith' age=30
    #> name='Michael Johnson' age=22
    #> name='Emily Davis' age=28
    #> name='David Brown' age=35

By incorporating names of celebrities as examples, we have shifted towards generating synthetic data featuring well-known personalities, moving away from the simplistic, single-word names previously used.

Leveraging Complex Example

To effectively generate synthetic examples with more nuance, lets upgrade to the "gpt-4-turbo-preview" model, use model level examples rather than attribute level examples:

import instructor

from typing import Iterable
from pydantic import BaseModel, ConfigDict
from openai import OpenAI


# Define the UserDetail model
class UserDetail(BaseModel):
    """Old Wizards"""

    name: str
    age: int

    model_config = ConfigDict(
        json_schema_extra={
            "examples": [
                {"name": "Gandalf the Grey", "age": 1000},
                {"name": "Albus Dumbledore", "age": 150},
            ]
        }
    )


# Patch the OpenAI client to enable the response_model functionality
client = instructor.from_openai(OpenAI())


def generate_fake_users(count: int) -> Iterable[UserDetail]:
    return client.chat.completions.create(
        model="gpt-4-turbo-preview",
        response_model=Iterable[UserDetail],
        messages=[
            {"role": "user", "content": f"Generate `{count}` synthetic examples"},
        ],
    )


for user in generate_fake_users(5):
    print(user)
    #> name='Merlin' age=1000
    #> name='Saruman the White' age=700
    #> name='Radagast the Brown' age=600
    #> name='Elminster Aumar' age=1200
    #> name='Mordenkainen' age=850

Leveraging Descriptions

By adjusting the descriptions within our Pydantic models, we can subtly influence the nature of the synthetic data generated. This method allows for a more nuanced control over the output, ensuring that the generated data aligns more closely with our expectations or requirements.

For instance, specifying "Fancy French sounding names" as a description for the name field in our UserDetail model directs the generation process to produce names that fit this particular criterion, resulting in a dataset that is both diverse and tailored to specific linguistic characteristics.

import instructor

from typing import Iterable
from pydantic import BaseModel, Field
from openai import OpenAI


# Define the UserDetail model
class UserDetail(BaseModel):
    name: str = Field(description="Fancy French sounding names")
    age: int


# Patch the OpenAI client to enable the response_model functionality
client = instructor.from_openai(OpenAI())


def generate_fake_users(count: int) -> Iterable[UserDetail]:
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=Iterable[UserDetail],
        messages=[
            {"role": "user", "content": f"Generate `{count}` synthetic users"},
        ],
    )


for user in generate_fake_users(5):
    print(user)
    #> name='Jean Luc' age=30
    #> name='Claire Belle' age=25
    #> name='Pierre Leclair' age=40
    #> name='Amelie Rousseau' age=35
    #> name='Etienne Lefevre' age=28

Structured Output for Open Source and Local LLMs

Instructor has expanded its capabilities for language models. It started with API interactions via the OpenAI SDK, using Pydantic for structured data validation. Now, Instructor supports multiple models and platforms.

The integration of JSON mode improved adaptability to vision models and open source alternatives. This allows support for models from GPT and Mistral to models on Ollama and Hugging Face, using llama-cpp-python.

Instructor now works with cloud-based APIs and local models for structured data extraction. Developers can refer to our guide on Patching for information on using JSON mode with different models.

For learning about Instructor and Pydantic, we offer a course on Steering language models towards structured outputs.

The following sections show examples of Instructor's integration with platforms and local setups for structured outputs in AI projects.

Seamless Support with Langsmith

Its a common misconception that LangChain's LangSmith is only compatible with LangChain's models. In reality, LangSmith is a unified DevOps platform for developing, collaborating, testing, deploying, and monitoring LLM applications. In this blog we will explore how LangSmith can be used to enhance the OpenAI client alongside instructor.

Advanced Caching Strategies for Python LLM Applications (Validated & Tested ✅)

Instructor makes working with language models easy, but they are still computationally expensive. Smart caching strategies can reduce costs by up to 90% while dramatically improving response times.

Update (June 2025) – Instructor now ships native caching support out-of-the-box. Pass a cache adapter directly when you create a client:

from instructor import from_provider
from instructor.cache import AutoCache, RedisCache

client = from_provider(
    "openai/gpt-4o",  # or any other provider
    cache=AutoCache(maxsize=10_000),   # in-process LRU
    # or cache=RedisCache(host="localhost")
)

Under the hood this uses the very same techniques explained below, so you can still roll your own adapter if you need a bespoke backend. The remainder of the post walks through the design rationale in detail and is fully compatible with the built-in implementation.

Built-in cache – feature matrix

Method / helper Cached What is stored Notes
create(...) ✅ Yes Parsed Pydantic model + raw completion JSON
create_with_completion(...) ✅ Yes Same as above – second tuple element restored from cache
create_partial(...) ❌ No Streaming generators not cached (yet)
create_iterable(...) ❌ No Streaming generators not cached (yet)
Any call with stream=True ❌ No Provider always invoked

How serialization works

  1. Model – we call model_dump_json() which produces a compact, loss-less JSON string. On a cache hit we re-hydrate with model_validate_json() so you get the same BaseModel subclass instance.
  2. Raw completion – Instructor attaches the original ChatCompletion (or provider-specific) object to the model as _raw_response. We serialise this object too (when possible with model_dump_json(), otherwise a plain str() fallback) and restore it on a cache hit so create_with_completion() behaves identically.
Raw Response Reconstruction

For raw completion objects, we use a SimpleNamespace trick to reconstruct the original object structure:

# When caching:
raw_json = completion.model_dump_json()  # Serialize to JSON

# When restoring from cache:
import json
from types import SimpleNamespace
restored = json.loads(raw_json, object_hook=lambda d: SimpleNamespace(**d))

This approach allows us to restore the original dot-notation access patterns (e.g., completion.usage.total_tokens) without requiring the original class definitions. The SimpleNamespace objects behave identically to the original completion objects for attribute access while being much simpler to reconstruct from JSON.

Defensive Handling

The cache implementation includes multiple fallback strategies for different provider response types:

  1. Pydantic models (OpenAI, Anthropic) - Use model_dump_json() for perfect serialization
  2. Plain dictionaries - Use standard json.dumps() with default=str fallback
  3. Unpickleable objects - Fall back to string representation with a warning

This ensures the cache works reliably across all providers, even if they don't follow the same response object patterns.

Streaming limitations

The current implementation opts not to cache streaming helpers (create_partial, create_iterable, or stream=True). Replaying a realistic token-stream requires a dedicated design which is coming in a future release. Until then, those calls always reach the provider.

Today, we're diving deep into optimizing instructor code while maintaining the excellent developer experience offered by Pydantic models. We'll tackle the challenges of caching Pydantic models, typically incompatible with pickle, and explore comprehensive solutions using decorators like functools.cache. Then, we'll craft production-ready custom decorators with diskcache and redis to support persistent caching, distributed systems, and high-throughput applications.

Generators and LLM Streaming

Latency is crucial, especially in eCommerce and newer chat applications like ChatGPT. Streaming is the solution that enables us to enhance the user experience without the need for faster response times.

And what makes streaming possible? Generators!