Validation Basics¶

Validation ensures that the data extracted by LLMs meets your requirements. This guide covers the essentials of validation with Instructor.

Why Validation Matters¶

Validation helps ensure:

Data Integrity: All required fields are present and formatted correctly
Consistency: Data follows your business rules
Quality: Outputs meet specific criteria for your application

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│ LLM         │ -> │ Instructor   │ -> │ Validated   │
│ Generates   │    │ Validates    │    │ Structured  │
│ Response    │    │ Structure    │    │ Data        │
└─────────────┘    └──────────────┘    └─────────────┘
                          │
                          │ If validation fails
                          ▼
                   ┌─────────────┐
                   │ Retry with  │
                   │ Feedback    │
                   └─────────────┘

Simple Example¶

Here's a basic example with validation:

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI

# Define a model with validation
class UserProfile(BaseModel):
    name: str
    age: int = Field(ge=13, description="User's age in years")

# Extract validated data
client = instructor.from_openai(OpenAI())
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "My name is Jane Smith and I'm 25 years old."}
    ],
    response_model=UserProfile
)

print(f"User: {response.name}, Age: {response.age}")

In this example: - The age field has a validation constraint (ge=13) ensuring users are at least 13 years old - If validation fails, Instructor will automatically retry with feedback

Common Validation Types¶

Here are the most common validations you can use:

Validation	Example	What It Does
Type checking	`age: int`	Ensures value is an integer
Required fields	`name: str`	Field must be present
Optional fields	`middle_name: Optional[str] = None`	Field can be missing
Minimum value	`age: int = Field(ge=18)`	Value must be ≥ 18
Maximum value	`rating: float = Field(le=5.0)`	Value must be ≤ 5.0
String length	`username: str = Field(min_length=3)`	String must be at least 3 chars

How Validation Works¶

When using validation with Instructor:

The LLM generates a response based on your prompt
Instructor tries to fit the response into your model
If validation fails, Instructor captures the errors
The errors are sent back to the LLM for a retry
This continues until validation passes or max retries is reached

Adding Custom Error Messages¶

For clearer feedback, you can add custom error messages:

from pydantic import BaseModel, Field

class Product(BaseModel):
    name: str
    price: float = Field(
        gt=0, 
        description="Product price in USD",
        json_schema_extra={"error_msg": "Price must be greater than zero"}
    )