Skip to content

Anthropic

Structured Outputs and Prompt Caching with Anthropic

Anthropic's ecosystem now offers two powerful features for AI developers: structured outputs and prompt caching. These advancements enable more efficient use of large language models (LLMs). This guide demonstrates how to leverage these features with the Instructor library to enhance your AI applications.

Structured Outputs with Anthropic and Instructor

Instructor now offers seamless integration with Anthropic's powerful language models, allowing developers to easily create structured outputs using Pydantic models. This integration simplifies the process of extracting specific information from AI-generated responses.

To get started, you'll need to install Instructor with Anthropic support:

pip install instructor[anthropic]

Here's a basic example of how to use Instructor with Anthropic:

from pydantic import BaseModel
from typing import List
import anthropic
import instructor

# Patch the Anthropic client with Instructor
anthropic_client = instructor.from_anthropic(
    create=anthropic.Anthropic()
)

# Define your Pydantic models
class Properties(BaseModel):
    name: str
    value: str

class User(BaseModel):
    name: str
    age: int
    properties: List[Properties]

# Use the patched client to generate structured output
user_response = anthropic_client(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Create a user for a model with a name, age, and properties.",
        }
    ],
    response_model=User,
)

print(user_response.model_dump_json(indent=2))
"""
{
  "name": "John Doe",
  "age": 30,
  "properties": [
    { "name": "favorite_color", "value": "blue" }
  ]
}
"""

This approach allows you to easily extract structured data from Claude's responses, making it simpler to integrate AI-generated content into your applications.

Prompt Caching: Boosting Performance and Reducing Costs

Anthropic has introduced a new prompt caching feature that can significantly improve response times and reduce costs for applications dealing with large context windows. This feature is particularly useful when making multiple calls with similar large contexts over time.

Here's how you can implement prompt caching with Instructor and Anthropic:

from instructor import Instructor, Mode, patch
from anthropic import Anthropic
from pydantic import BaseModel

# Set up the client with prompt caching
client = instructor.from_anthropic(Anthropic())

# Define your Pydantic model
class Character(BaseModel):
    name: str
    description: str

# Load your large context
with open("./book.txt", "r") as f:
    book = f.read()

# Make multiple calls using the cached context
for _ in range(2):
    resp, completion = client.chat.completions.create_with_completion(
        model="claude-3-haiku-20240307",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "<book>" + book + "</book>",
                        "cache_control": {"type": "ephemeral"},
                    },
                    {
                        "type": "text",
                        "text": "Extract a character from the text given above",
                    },
                ],
            },
        ],
        response_model=Character,
        max_tokens=1000,
    )

In this example, the large context (the book content) is cached after the first request and reused in subsequent requests. This can lead to significant time and cost savings, especially when working with extensive context windows.

Conclusion

By combining Anthropic's Claude with Instructor's structured output capabilities and leveraging prompt caching, developers can create more efficient, cost-effective, and powerful AI applications. These features open up new possibilities for building sophisticated AI systems that can handle complex tasks with ease.

As the AI landscape continues to evolve, staying up-to-date with the latest tools and techniques is crucial. We encourage you to explore these features and share your experiences with the community. Happy coding!

Why should I use prompt caching?

Developers often face two key challenges when working with large context - Slow response times and high costs. This is especially true when we're making multiple of these calls over time, severely impacting the cost and latency of our applications. With Anthropic's new prompt caching feature, we can easily solve both of these issues.

Since the new feature is still in beta, we're going to wait for it to be generally avaliable before we integrate it into instructor. In the meantime, we've put together a quickstart guide on how to use the feature in your own applications.

Structured Outputs with Anthropic

A special shoutout to Shreya for her contributions to the anthropic support. As of now, all features are operational with the exception of streaming support.

For those eager to experiment, simply patch the client with ANTHROPIC_JSON, which will enable you to leverage the anthropic client for making requests.

pip install instructor[anthropic]

Missing Features

Just want to acknowledge that we know that we are missing partial streaming and some better re-asking support for XML. We are working on it and will have it soon.

from pydantic import BaseModel
from typing import List
import anthropic
import instructor

# Patching the Anthropics client with the instructor for enhanced capabilities
anthropic_client = instructor.from_openai(
    create=anthropic.Anthropic().messages.create,
    mode=instructor.Mode.ANTHROPIC_JSON
)

class Properties(BaseModel):
    name: str
    value: str

class User(BaseModel):
    name: str
    age: int
    properties: List[Properties]

user_response = anthropic_client(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    max_retries=0,
    messages=[
        {
            "role": "user",
            "content": "Create a user for a model with a name, age, and properties.",
        }
    ],
    response_model=User,
)  # type: ignore

print(user_response.model_dump_json(indent=2))
"""
{
    "name": "John",
    "age": 25,
    "properties": [
        {
            "key": "favorite_color",
            "value": "blue"
        }
    ]
}

We're encountering challenges with deeply nested types and eagerly invite the community to test, provide feedback, and suggest necessary improvements as we enhance the anthropic client's support.