Skip to content

Subscribe to our Newsletter for Updates and Tips

If you want to get updates on new features and tips on how to use Instructor, you can subscribe to our newsletter below to get notified when we publish new content.

Advanced Topics

  1. Unified Provider Interface in Instructor
  2. Instructor Implements llms.txt
  3. Query Understanding: Beyond Embeddings
  4. Achieving GPT-4 Level Summaries with GPT-3.5-turbo
  5. Basics of Guardrails and Validation in AI Models
  6. Validating Citations in AI-Generated Content
  7. Fine-tuning and Distillation in AI Models
  8. Enhancing OpenAI Client Observability with LangSmith
  9. Logfire Integration with Pydantic

AI Development and Optimization

Language Models and Prompting Techniques

Integrations and Tools

Media and Resources

Understanding Semantic Validation with Structured Outputs

Semantic validation uses LLMs to evaluate content against complex, subjective, and contextual criteria that would be difficult to implement with traditional rule-based validation approaches.

As LLMs become increasingly integrated into production systems, ensuring the quality and safety of their outputs is paramount. Traditional validation methods relying on explicit rules can't keep up with the complexity and nuance of natural language. With the release of Instructor's semantic validation capabilities, we now have a powerful way to validate structured outputs against sophisticated criteria.

Announcing Responses API support

We're excited to announce Instructor's integration with OpenAI's new Responses API. This integration brings a more streamlined approach to working with structured outputs from OpenAI models. Let's see what makes this integration special and how it can improve your LLM applications.

Announcing unified provider interface

We are pleased to introduce a significant enhancement to Instructor: the from_provider() function. While Instructor has always focused on providing robust structured outputs, we've observed that many users work with multiple LLM providers. This often involves repetitive setup for each client.

The from_provider() function aims to simplify this process, making it easier to initialize clients and experiment across different models.

This new feature offers a streamlined, string-based method to initialize an Instructor-enhanced client for a variety of popular LLM providers.

Using Anthropic's Web Search with Instructor for Real-Time Data

Anthropic's new web search tool, when combined with Instructor, provides a powerful way to get real-time, structured data from the web. This allows you to build applications that can answer questions and provide information that is up-to-date, going beyond the knowledge cut-off of large language models.

In this post, we'll explore how to use the web_search tool with Instructor to fetch the latest information and structure it into a Pydantic model. Even a simple structure can be very effective for clarity and further processing.

Instructor Adopting Cursor Rules

AI-assisted coding is changing how we use version control. Many developers now use what I call "vibe coding" - coding with AI help. This creates new challenges with Git. Today I'll share how we're using Cursor rules in Instructor to solve these problems.

Native Caching in Instructor v1.9.1: Zero-Configuration Performance Boost

New in v1.9.1: Instructor now ships with built-in caching support for all providers. Simply pass a cache adapter when creating your client to dramatically reduce API costs and improve response times.

Starting with Instructor v1.9.1, we've introduced native caching support that makes optimization effortless. Instead of implementing complex caching decorators or wrapper functions, you can now pass a cache adapter directly to from_provider() and automatically cache all your structured LLM calls.

The Game Changer: Built-in Caching

Before v1.9.1, caching required custom decorators and manual implementation. Now, it's as simple as:

from instructor import from_provider
from instructor.cache import AutoCache

# Works with any provider - caching flows through automatically
client = from_provider(
    "openai/gpt-4o",
    cache=AutoCache(maxsize=1000)
)

# Your normal calls are now cached automatically
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

first = client.create(
    messages=[{"role": "user", "content": "Extract: John is 25"}],
    response_model=User
)

second = client.create(
    messages=[{"role": "user", "content": "Extract: John is 25"}],
    response_model=User
)

# second call was served from cache - same result, zero cost!
assert first.name == second.name

Universal Provider Support

The beauty of native caching is that it works with every provider through the same simple API:

from instructor.cache import AutoCache, DiskCache

# Works with OpenAI
openai_client = from_provider("openai/gpt-3.5-turbo", cache=AutoCache())

# Works with Anthropic  
anthropic_client = from_provider("anthropic/claude-3-haiku", cache=AutoCache())

# Works with Google
google_client = from_provider("google/gemini-pro", cache=DiskCache())

# Works with any provider in the ecosystem
groq_client = from_provider("groq/llama-3.1-8b", cache=AutoCache())

No provider-specific configuration needed. The cache parameter flows through **kwargs to all underlying implementations automatically.

Built-in Cache Adapters

Instructor v1.9.1 ships with two production-ready cache implementations:

1. AutoCache - In-Process LRU Cache

Perfect for single-process applications and development:

from instructor.cache import AutoCache

# Thread-safe in-memory cache with LRU eviction
cache = AutoCache(maxsize=1000)
client = from_provider("openai/gpt-4o", cache=cache)

When to use: - Development and testing - Single-process applications - When you need maximum speed (200,000x+ faster cache hits) - Applications where cache persistence isn't required

2. DiskCache - Persistent Storage

Ideal when you need cache persistence across sessions:

from instructor.cache import DiskCache

# Persistent disk-based cache
cache = DiskCache(directory=".instructor_cache")
client = from_provider("anthropic/claude-3-sonnet", cache=cache)

When to use: - Applications that restart frequently - Development workflows where you want to preserve cache between sessions - When working with expensive or time-intensive API calls - Local applications with moderate performance requirements

Smart Cache Key Generation

Instructor automatically generates intelligent cache keys that include:

  • Provider/model name - Different models get different cache entries
  • Complete message history - Full conversation context is hashed
  • Response model schema - Any changes to your Pydantic model automatically bust the cache
  • Mode configuration - JSON vs Tools mode changes are tracked

This means when you update your Pydantic model (adding fields, changing descriptions, etc.), the cache automatically invalidates old entries - no stale data!

from instructor.cache import make_cache_key

# Generate deterministic cache key
key = make_cache_key(
    messages=[{"role": "user", "content": "hello"}],
    model="gpt-3.5-turbo", 
    response_model=User,
    mode="TOOLS"
)
print(key)  # SHA-256 hash: 9b8f5e2c8c9e...

Custom Cache Implementations

Want Redis, Memcached, or a custom backend? Simply inherit from BaseCache:

from instructor.cache import BaseCache
import redis

class RedisCache(BaseCache):
    def __init__(self, host="localhost", port=6379, **kwargs):
        self.redis = redis.Redis(host=host, port=port, **kwargs)

    def get(self, key: str):
        value = self.redis.get(key)
        return value.decode() if value else None

    def set(self, key: str, value, ttl: int | None = None):
        if ttl:
            self.redis.setex(key, ttl, value)
        else:
            self.redis.set(key, value)

# Use your custom cache
redis_cache = RedisCache(host="my-redis-server")
client = from_provider("openai/gpt-4o", cache=redis_cache)

The BaseCache interface is intentionally minimal - just implement get() and set() methods and you're ready to go.

Time-to-Live (TTL) Support

Control cache expiration with per-call TTL overrides:

# Cache this result for 1 hour
result = client.create(
    messages=[{"role": "user", "content": "Generate daily report"}],
    response_model=Report,
    cache_ttl=3600  # 1 hour in seconds
)

TTL support depends on your cache backend: - AutoCache: TTL is ignored (no expiration) - DiskCache: Full TTL support with automatic expiration - Custom backends: Implement TTL handling in your set() method

Migration from Manual Caching

If you were using custom caching decorators, migrating is straightforward:

Before v1.9.1:

@functools.cache
def extract_user(text: str) -> User:
    return client.create(
        messages=[{"role": "user", "content": text}],
        response_model=User
    )

With v1.9.1:

# Remove decorator, add cache to client
client = from_provider("openai/gpt-4o", cache=AutoCache())

def extract_user(text: str) -> User:
    return client.create(
        messages=[{"role": "user", "content": text}],
        response_model=User
    )

No more function-level caching logic - just create your client with caching enabled and all calls benefit automatically.

Real-World Performance Impact

Native caching delivers the same dramatic performance improvements you'd expect:

  • AutoCache: 200,000x+ speed improvement for cache hits
  • DiskCache: 5-10x improvement with persistence benefits
  • Cost Reduction: 50-90% API cost savings depending on cache hit rate

For a comprehensive deep-dive into caching strategies and performance analysis, check out our complete caching guide.

Getting Started

Ready to enable native caching? Here's your quick start:

  1. Upgrade to v1.9.1+:

    pip install "instructor>=1.9.1"
    

  2. Choose your cache backend:

    from instructor.cache import AutoCache, DiskCache
    
    # For development/single-process
    cache = AutoCache(maxsize=1000)
    
    # For persistence
    cache = DiskCache(directory=".cache")
    

  3. Add cache to your client:

    from instructor import from_provider
    
    client = from_provider("your/favorite/model", cache=cache)
    

  4. Use normally - caching happens automatically:

    result = client.create(
        messages=[{"role": "user", "content": "your prompt"}],
        response_model=YourModel
    )
    

Learn More

For detailed information about cache design, custom implementations, and advanced patterns, visit our Caching Concepts documentation.

The native caching feature represents our commitment to making high-performance LLM applications simple and accessible. No more complex caching logic - just fast, cost-effective structured outputs out of the box.


Have questions about native caching or want to share your use case? Join the discussion in our GitHub repository or check out the complete documentation.

Migrating to uv

Why we migrated to uv

We recently migrated to uv from poetry because we wanted to benefit from it's many features such as

  • Easier dependency management with automatic caching built in
  • Significantly faster CI/CD compared to poetry, especially when we use the caching functionality provided by the Astral team
  • Cargo-style lockfile that makes it easier to adopt new PEP features as they come out

We took around 1-2 days to handle the migration and we're happy with the results. On average, for CI/CD, we've seen a huge speed up for our jobs.

Here are some timings for jobs that I took from our CI/CD runs.

In general I'd say that we saw a ~3x speedup with approximately 67% reduction in time needed for the jobs once we implemented caching for the individual uv github actions.

Extracting Metadata from Images using Structured Extraction

Multimodal Language Models like gpt-4o excel at processing multimodal, enabling us to extract rich, structured metadata from images.

This is particularly valuable in areas like fashion where we can use these capabilities to understand user style preferences from images and even videos. In this post, we'll see how to use instructor to map images to a given product taxonomy so we can recommend similar products for users.