Start Here: Instructor for Beginners¶

Welcome! This guide will help you understand what Instructor does and how to start using it in your projects, even if you're new to working with language models.

What is Instructor?¶

Instructor is a Python library that helps you get structured, predictable data from language models like GPT-4 and Claude. It's like giving the LLM a form to fill out instead of letting it respond however it wants.

Where Instructor Fits¶

Here's how Instructor fits into your application:

flowchart LR
    A[Your Application] --> B[Instructor]
    B --> C[LLM Provider]
    C --> B
    B --> A

    style B fill:#e2f0fb,stroke:#b8daff,color:#004085

The Problem Instructor Solves¶

Without Instructor, getting structured data from LLMs can be challenging:

Unpredictable outputs: LLMs might format responses differently each time
Format errors: Getting JSON or specific data structures can be error-prone
Validation headaches: Checking if the response matches what you need

Instructor solves these problems by:

Defining exactly what data you want using Python classes
Making sure the LLM returns data in that structure
Validating the output and automatically fixing issues

A Simple Example¶

Let's see Instructor in action with a basic example:

# Import the necessary libraries
import instructor
from openai import OpenAI
from pydantic import BaseModel

# Define the structure you want
class Person:
    name: str
    age: int
    city: str

# Connect to the LLM with Instructor
client = instructor.from_openai(OpenAI())

# Extract structured data
person = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=Person,
    messages=[
        {"role": "user", "content": "Extract a person from: John is 30 years old and lives in New York."}
    ]
)

# Now you have a structured object
print(f"Name: {person.name}")  # Name: John
print(f"Age: {person.age}")    # Age: 30
print(f"City: {person.city}")  # City: New York

That's it! Instructor handled all the complexity of getting the LLM to format the data correctly.

Key Concepts¶

Here are the main concepts you need to know:

1. Response Models¶

Response models define the structure you want the LLM to return. They are built using Pydantic, which is a data validation library.

from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(description="The user's full name")
    age: int = Field(description="The user's age in years")
    # The descriptions help the LLM understand what to extract

2. Patching¶

Patching connects Instructor to your LLM provider (like OpenAI or Anthropic).

# For OpenAI
client = instructor.from_openai(OpenAI())

# For Anthropic
client = instructor.from_anthropic(Anthropic())

3. Modes¶

Modes control how Instructor gets structured data from the LLM. Different providers support different modes.

# Using OpenAI's function calling
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

# Using JSON output directly
client = instructor.from_openai(OpenAI(), mode=instructor.Mode.JSON)

Common Use Cases¶

Here are some popular ways people use Instructor:

Data extraction: Pull structured information from text documents
Form filling: Convert free-text into form fields
Classification: Sort content into predefined categories
Content generation: Create structured content like articles or product descriptions
API integration: Format LLM outputs to match API requirements

Next Steps¶

Now that you understand the basics, here are some suggested next steps:

Try the Getting Started Guide for a more in-depth tutorial
Explore the Cookbook Examples for practical use cases
Learn about Validation to ensure data quality
Check out Streaming for handling large responses
Understand Providers to use different LLM services

Common Questions¶

Do I need to understand Pydantic?¶

While knowing Pydantic helps, you don't need to be an expert. The basic patterns shown above will get you started. You can learn more advanced features as you need them.

Which LLM provider should I use?¶

OpenAI is the most popular choice for beginners because of its reliability and wide support. As you grow more comfortable, you can explore other providers like Anthropic Claude, Gemini, or open-source models.

Is Instructor hard to learn?¶

No! If you're familiar with Python classes and working with APIs, you'll find Instructor straightforward. The core concepts are simple, and you can gradually explore advanced features.

How does Instructor compare to other libraries?¶

Instructor focuses specifically on structured outputs with a simple, clean API. Unlike larger frameworks that try to do everything, Instructor does one thing very well: getting structured data from LLMs.

Getting Help¶

If you get stuck:

Check the FAQ for common questions
Browse the Examples for similar use cases
Join our Discord community for real-time help
Look for related topics in the Concepts section

Welcome aboard, and happy extracting!