Getting Started with Instructor¶
This guide will walk you through the basics of using Instructor to extract structured data from language models. By the end, you'll understand how to:
- Install and set up Instructor
- Extract basic structured data
- Handle validation and errors
- Work with streaming responses
- Use different LLM providers
Installation¶
First, install Instructor:
To use a specific provider, install the appropriate extras:
# For OpenAI (included by default)
pip install instructor
# For Anthropic
pip install "instructor[anthropic]"
# For other providers
pip install "instructor[google-genai]" # For Google/Gemini
pip install "instructor[vertexai]" # For Vertex AI
pip install "instructor[cohere]" # For Cohere
pip install "instructor[litellm]" # For LiteLLM (multiple providers)
pip install "instructor[mistralai]" # For Mistral
Setting Up Environment¶
Set your API keys as environment variables:
# For OpenAI
export OPENAI_API_KEY=your_openai_api_key
# For Anthropic
export ANTHROPIC_API_KEY=your_anthropic_api_key
# For other providers, set relevant API keys
Your First Structured Output¶
Let's start with a simple example using OpenAI:
import instructor
from pydantic import BaseModel
# Define your output structure
class UserInfo(BaseModel):
name: str
age: int
# Create an instructor client with from_provider
client = instructor.from_provider("openai/gpt-5-nano")
# Extract structured data
user_info = client.create(
response_model=UserInfo,
messages=[
{"role": "user", "content": "John Doe is 30 years old."}
],
)
print(f"Name: {user_info.name}, Age: {user_info.age}")
# Output: Name: John Doe, Age: 30
This example demonstrates the core workflow: 1. Define a Pydantic model for your output structure 2. Create an Instructor client with from_provider 3. Request structured output using the response_model parameter
Validation and Error Handling¶
Instructor leverages Pydantic's validation to ensure your data meets requirements:
from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
name: str
age: int = Field(gt=0, lt=120) # Age must be between 0 and 120
@field_validator('name')
def name_must_have_space(cls, v):
if ' ' not in v:
raise ValueError('Name must include first and last name')
return v
# This will make the LLM retry if validation fails
user = client.create(
response_model=User,
messages=[
{"role": "user", "content": "Extract: Tom is 25 years old."}
],
)
Working with Complex Models¶
Instructor works seamlessly with nested Pydantic models:
from pydantic import BaseModel
from typing import List
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Person(BaseModel):
name: str
age: int
addresses: List[Address]
person = client.create(
response_model=Person,
messages=[
{"role": "user", "content": """
Extract: John Smith is 35 years old.
He has homes at 123 Main St, Springfield, IL 62704 and
456 Oak Ave, Chicago, IL 60601.
"""}
],
)
Streaming Responses¶
For larger responses or better user experience, use streaming:
from instructor import Partial
# Stream the response as it's being generated
stream = client.create_partial(
response_model=Person,
messages=[
{"role": "user", "content": "Extract a detailed person profile for John Smith, 35, who lives in Chicago and Springfield."}
],
)
for partial in stream:
# This will incrementally show the response being built
print(partial)
Using Different Providers¶
Instructor supports multiple LLM providers. Here's how to use Anthropic:
import instructor
from pydantic import BaseModel
class UserInfo(BaseModel):
name: str
age: int
# Create an instructor client with from_provider
client = instructor.from_provider("anthropic/claude-3-opus-20240229")
user_info = client.create(
response_model=UserInfo,
messages=[
{"role": "user", "content": "John Doe is 30 years old."}
],
)
print(f"Name: {user_info.name}, Age: {user_info.age}")
Frequently Asked Questions¶
What's the difference between start-here.md and getting-started.md?¶
- Start Here: Explains what Instructor is and why you'd use it (conceptual overview)
- Getting Started: This guide - shows you how to install and use Instructor (practical steps)
Which provider should I start with?¶
OpenAI is the most popular choice for beginners due to reliability and wide support. Once comfortable, you can explore Anthropic Claude, Google Gemini, or open-source models.
Do I need to understand Pydantic?¶
Basic knowledge helps, but you can start with simple models. Instructor works with any Pydantic BaseModel. Learn more advanced features as you need them.
Can I use Instructor with async code?¶
Yes! Use async_client=True when creating your client: client = instructor.from_provider("openai/gpt-4o", async_client=True), then use await client.create().
What if validation fails?¶
Instructor automatically retries with validation feedback. You can configure retry behavior with max_retries parameter. See retry mechanisms for details.
Next Steps¶
Now that you've mastered the basics, here are some next steps:
- Learn about client setup with from_provider for different LLM providers
- Explore advanced validation to ensure data quality
- Check out the Cookbook examples for real-world applications
- See how to use hooks for monitoring and debugging
Using older patterns? If you're using instructor.patch() or provider-specific functions like from_openai(), check out the Migration Guide to modernize your code.
New to Instructor? Start with Start Here for a conceptual overview.
For more detailed information on any topic, visit the Concepts section.
If you have questions or need help, join our Discord community or check the GitHub repository.