Skip to content

Subscribe to our Newsletter for Updates and Tips

Native Caching in Instructor v1.9.1: Zero-Configuration Performance Boost

New in v1.9.1: Instructor now ships with built-in caching support for all providers. Simply pass a cache adapter when creating your client to dramatically reduce API costs and improve response times.

Starting with Instructor v1.9.1, we've introduced native caching support that makes optimization effortless. Instead of implementing complex caching decorators or wrapper functions, you can now pass a cache adapter directly to from_provider() and automatically cache all your structured LLM calls.

The Game Changer: Built-in Caching

Before v1.9.1, caching required custom decorators and manual implementation. Now, it's as simple as:

from instructor import from_provider
from instructor.cache import AutoCache

# Works with any provider - caching flows through automatically
client = from_provider(
    "openai/gpt-4o",
    cache=AutoCache(maxsize=1000)
)

# Your normal calls are now cached automatically
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

first = client.create(
    messages=[{"role": "user", "content": "Extract: John is 25"}],
    response_model=User
)

second = client.create(
    messages=[{"role": "user", "content": "Extract: John is 25"}],
    response_model=User
)

# second call was served from cache - same result, zero cost!
assert first.name == second.name

Universal Provider Support

The beauty of native caching is that it works with every provider through the same simple API:

from instructor.cache import AutoCache, DiskCache

# Works with OpenAI
openai_client = from_provider("openai/gpt-3.5-turbo", cache=AutoCache())

# Works with Anthropic  
anthropic_client = from_provider("anthropic/claude-3-haiku", cache=AutoCache())

# Works with Google
google_client = from_provider("google/gemini-pro", cache=DiskCache())

# Works with any provider in the ecosystem
groq_client = from_provider("groq/llama-3.1-8b", cache=AutoCache())

No provider-specific configuration needed. The cache parameter flows through **kwargs to all underlying implementations automatically.

Built-in Cache Adapters

Instructor v1.9.1 ships with two production-ready cache implementations:

1. AutoCache - In-Process LRU Cache

Perfect for single-process applications and development:

from instructor.cache import AutoCache

# Thread-safe in-memory cache with LRU eviction
cache = AutoCache(maxsize=1000)
client = from_provider("openai/gpt-4o", cache=cache)

When to use: - Development and testing - Single-process applications - When you need maximum speed (200,000x+ faster cache hits) - Applications where cache persistence isn't required

2. DiskCache - Persistent Storage

Ideal when you need cache persistence across sessions:

from instructor.cache import DiskCache

# Persistent disk-based cache
cache = DiskCache(directory=".instructor_cache")
client = from_provider("anthropic/claude-3-sonnet", cache=cache)

When to use: - Applications that restart frequently - Development workflows where you want to preserve cache between sessions - When working with expensive or time-intensive API calls - Local applications with moderate performance requirements

Smart Cache Key Generation

Instructor automatically generates intelligent cache keys that include:

  • Provider/model name - Different models get different cache entries
  • Complete message history - Full conversation context is hashed
  • Response model schema - Any changes to your Pydantic model automatically bust the cache
  • Mode configuration - JSON vs Tools mode changes are tracked

This means when you update your Pydantic model (adding fields, changing descriptions, etc.), the cache automatically invalidates old entries - no stale data!

from instructor.cache import make_cache_key

# Generate deterministic cache key
key = make_cache_key(
    messages=[{"role": "user", "content": "hello"}],
    model="gpt-3.5-turbo", 
    response_model=User,
    mode="TOOLS"
)
print(key)  # SHA-256 hash: 9b8f5e2c8c9e...

Custom Cache Implementations

Want Redis, Memcached, or a custom backend? Simply inherit from BaseCache:

from instructor.cache import BaseCache
import redis

class RedisCache(BaseCache):
    def __init__(self, host="localhost", port=6379, **kwargs):
        self.redis = redis.Redis(host=host, port=port, **kwargs)

    def get(self, key: str):
        value = self.redis.get(key)
        return value.decode() if value else None

    def set(self, key: str, value, ttl: int | None = None):
        if ttl:
            self.redis.setex(key, ttl, value)
        else:
            self.redis.set(key, value)

# Use your custom cache
redis_cache = RedisCache(host="my-redis-server")
client = from_provider("openai/gpt-4o", cache=redis_cache)

The BaseCache interface is intentionally minimal - just implement get() and set() methods and you're ready to go.

Time-to-Live (TTL) Support

Control cache expiration with per-call TTL overrides:

# Cache this result for 1 hour
result = client.create(
    messages=[{"role": "user", "content": "Generate daily report"}],
    response_model=Report,
    cache_ttl=3600  # 1 hour in seconds
)

TTL support depends on your cache backend: - AutoCache: TTL is ignored (no expiration) - DiskCache: Full TTL support with automatic expiration - Custom backends: Implement TTL handling in your set() method

Migration from Manual Caching

If you were using custom caching decorators, migrating is straightforward:

Before v1.9.1:

@functools.cache
def extract_user(text: str) -> User:
    return client.create(
        messages=[{"role": "user", "content": text}],
        response_model=User
    )

With v1.9.1:

# Remove decorator, add cache to client
client = from_provider("openai/gpt-4o", cache=AutoCache())

def extract_user(text: str) -> User:
    return client.create(
        messages=[{"role": "user", "content": text}],
        response_model=User
    )

No more function-level caching logic - just create your client with caching enabled and all calls benefit automatically.

Real-World Performance Impact

Native caching delivers the same dramatic performance improvements you'd expect:

  • AutoCache: 200,000x+ speed improvement for cache hits
  • DiskCache: 5-10x improvement with persistence benefits
  • Cost Reduction: 50-90% API cost savings depending on cache hit rate

For a comprehensive deep-dive into caching strategies and performance analysis, check out our complete caching guide.

Getting Started

Ready to enable native caching? Here's your quick start:

  1. Upgrade to v1.9.1+:

    pip install "instructor>=1.9.1"
    

  2. Choose your cache backend:

    from instructor.cache import AutoCache, DiskCache
    
    # For development/single-process
    cache = AutoCache(maxsize=1000)
    
    # For persistence
    cache = DiskCache(directory=".cache")
    

  3. Add cache to your client:

    from instructor import from_provider
    
    client = from_provider("your/favorite/model", cache=cache)
    

  4. Use normally - caching happens automatically:

    result = client.create(
        messages=[{"role": "user", "content": "your prompt"}],
        response_model=YourModel
    )
    

Learn More

For detailed information about cache design, custom implementations, and advanced patterns, visit our Caching Concepts documentation.

The native caching feature represents our commitment to making high-performance LLM applications simple and accessible. No more complex caching logic - just fast, cost-effective structured outputs out of the box.


Have questions about native caching or want to share your use case? Join the discussion in our GitHub repository or check out the complete documentation.

Migrating to uv

Why we migrated to uv

We recently migrated to uv from poetry because we wanted to benefit from it's many features such as

  • Easier dependency management with automatic caching built in
  • Significantly faster CI/CD compared to poetry, especially when we use the caching functionality provided by the Astral team
  • Cargo-style lockfile that makes it easier to adopt new PEP features as they come out

We took around 1-2 days to handle the migration and we're happy with the results. On average, for CI/CD, we've seen a huge speed up for our jobs.

Here are some timings for jobs that I took from our CI/CD runs.

In general I'd say that we saw a ~3x speedup with approximately 67% reduction in time needed for the jobs once we implemented caching for the individual uv github actions.

Extracting Metadata from Images using Structured Extraction

Multimodal Language Models like gpt-4o excel at processing multimodal, enabling us to extract rich, structured metadata from images.

This is particularly valuable in areas like fashion where we can use these capabilities to understand user style preferences from images and even videos. In this post, we'll see how to use instructor to map images to a given product taxonomy so we can recommend similar products for users.

Consistent Stories with GPT-4o

Language Models struggle to generate consistent graphs that have a large number of nodes. Often times, this is because the graph itself is too large for the model to handle. This causes the model to generate inconsistent graphs that have invalid and disconnected nodes among other issues.

In this article, we'll look at how we can get around this limitation by using a two-phase approach to generate complex DAGs with gpt-4o by looking at a simple example of generating a Choose Your Own Adventure story.

Consistent Stories with GPT-4o

Language Models struggle to generate consistent graphs that have a large number of nodes. Often times, this is because the graph itself is too large for the model to handle. This causes the model to generate inconsistent graphs that have invalid and disconnected nodes among other issues.

In this article, we'll look at how we can get around this limitation by using a two-phase approach to generate complex DAGs with gpt-4o by looking at a simple example of generating a Choose Your Own Adventure story.

Using Structured Outputs to convert messy tables into tidy data

Why is this a problem?

Messy data exports are a common problem. Whether it's multiple headers in the table, implicit relationships that make analysis a pain or even just merged cells, using instructor with structured outputs makes it easy to convert messy tables into tidy data, even if all you have is just an image of the table as we'll see below.

Let's look at the following table as an example. It makes analysis unnecessarily difficult because it hides data relationships through empty cells and implicit repetition. If we were using it for data analysis, cleaning it manually would be a huge nightmare.

Structured Outputs with Writer now supported

We're excited to announce that instructor now supports Writer's enterprise-grade LLMs, including their latest Palmyra X 004 model. This integration enables structured outputs and enterprise AI workflows with Writer's powerful language models.

Getting Started

First, make sure that you've signed up for an account on Writer and obtained an API key using this quickstart guide. Once you've done so, install instructor with Writer support by running pip install instructor[writer] in your terminal.

Make sure to set the WRITER_API_KEY environment variable with your Writer API key or pass it as an argument to the Writer constructor.

PDF Processing with Structured Outputs with Gemini

In this post, we'll explore how to use Google's Gemini model with Instructor to analyse the Gemini 1.5 Pro Paper and extract a structured summary.

The Problem

Processing PDFs programmatically has always been painful. The typical approaches all have significant drawbacks:

  • PDF parsing libraries require complex rules and break easily
  • OCR solutions are slow and error-prone
  • Specialized PDF APIs are expensive and require additional integration
  • LLM solutions often need complex document chunking and embedding pipelines

What if we could just hand a PDF to an LLM and get structured data back? With Gemini's multimodal capabilities and Instructor's structured output handling, we can do exactly that.

Quick Setup

First, install the required packages:

pip install "instructor[google-generativeai]"

Then, here's all the code you need:

import instructor
import google.generativeai as genai
from google.ai.generativelanguage_v1beta.types.file import File
from pydantic import BaseModel
import time

# Initialize the client
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    )
)


# Define your output structure
class Summary(BaseModel):
    summary: str


# Upload the PDF
file = genai.upload_file("path/to/your.pdf")

# Wait for file to finish processing
while file.state != File.State.ACTIVE:
    time.sleep(1)
    file = genai.get_file(file.name)
    print(f"File is still uploading, state: {file.state}")

print(f"File is now active, state: {file.state}")
print(file)

resp = client.chat.completions.create(
    messages=[
        {"role": "user", "content": ["Summarize the following file", file]},
    ],
    response_model=Summary,
)

print(resp.summary)
Expand to see Raw Results
summary="Gemini 1.5 Pro is a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Gemini 1.5 Pro is built to handle extremely long contexts; it has the ability to recall and reason over fine-grained information from up to at least 10M tokens. This scale is unprecedented among contemporary large language models (LLMs), and enables the processing of long-form mixed-modality inputs including entire collections of documents, multiple hours of video, and almost five days long of audio. Gemini 1.5 Pro surpasses Gemini 1.0 Pro and performs at a similar level to 1.0 Ultra on a wide array of benchmarks while requiring significantly less compute to train. It can recall information amidst distractor context, and it can learn to translate a new language from a single set of linguistic documentation. With only instructional materials (a 500-page reference grammar, a dictionary, and ≈ 400 extra parallel sentences) all provided in context, Gemini 1.5 Pro is capable of learning to translate from English to Kalamang, a Papuan language with fewer than 200 speakers, and therefore almost no online presence."

Benefits

The combination of Gemini and Instructor offers several key advantages over traditional PDF processing approaches:

Simple Integration - Unlike traditional approaches that require complex document processing pipelines, chunking strategies, and embedding databases, you can directly process PDFs with just a few lines of code. This dramatically reduces development time and maintenance overhead.

Structured Output - Instructor's Pydantic integration ensures you get exactly the data structure you need. The model's outputs are automatically validated and typed, making it easier to build reliable applications. If the extraction fails, Instructor automatically handles the retries for you with support for custom retry logic using tenacity.

Multimodal Support - Gemini's multimodal capabilities mean this same approach works for various file types. You can process images, videos, and audio files all in the same api request. Check out our multimodal processing guide to see how we extract structured data from travel videos.

Conclusion

Working with PDFs doesn't have to be complicated.

By combining Gemini's multimodal capabilities with Instructor's structured output handling, we can transform complex document processing into simple, Pythonic code.

No more wrestling with parsing rules, managing embeddings, or building complex pipelines - just define your data model and let the LLM do the heavy lifting.

See Also

If you liked this, give instructor a try today and see how much easier structured outputs makes working with LLMs become. Get started with Instructor today!

Do I Still Need Instructor with Google's New OpenAI Integration?

Google recently launched OpenAI client compatibility for Gemini.

While this is a significant step forward for developers by simplifying Gemini model interactions, you absolutely still need instructor.

If you're unfamiliar with instructor, we provide a simple interface to get structured outputs from LLMs across different providers.

This makes it easy to switch between providers, get reliable outputs from language models and ultimately build production grade LLM applications.