Validation Basics¶
Validation ensures that the data extracted by LLMs meets your requirements. This guide covers the essentials of validation with Instructor.
Why Validation Matters¶
Validation helps ensure:
- Data Integrity: All required fields are present and formatted correctly
- Consistency: Data follows your business rules
- Quality: Outputs meet specific criteria for your application
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ LLM │ -> │ Instructor │ -> │ Validated │
│ Generates │ │ Validates │ │ Structured │
│ Response │ │ Structure │ │ Data │
└─────────────┘ └──────────────┘ └─────────────┘
│
│ If validation fails
▼
┌─────────────┐
│ Retry with │
│ Feedback │
└─────────────┘
Simple Example¶
Here's a basic example with validation:
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
# Define a model with validation
class UserProfile(BaseModel):
name: str
age: int = Field(ge=13, description="User's age in years")
# Extract validated data
client = instructor.from_openai(OpenAI())
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "My name is Jane Smith and I'm 25 years old."}
],
response_model=UserProfile
)
print(f"User: {response.name}, Age: {response.age}")
In this example: - The age
field has a validation constraint (ge=13
) ensuring users are at least 13 years old - If validation fails, Instructor will automatically retry with feedback
Common Validation Types¶
Here are the most common validations you can use:
Validation | Example | What It Does |
---|---|---|
Type checking | age: int | Ensures value is an integer |
Required fields | name: str | Field must be present |
Optional fields | middle_name: Optional[str] = None | Field can be missing |
Minimum value | age: int = Field(ge=18) | Value must be ≥ 18 |
Maximum value | rating: float = Field(le=5.0) | Value must be ≤ 5.0 |
String length | username: str = Field(min_length=3) | String must be at least 3 chars |
How Validation Works¶
When using validation with Instructor:
- The LLM generates a response based on your prompt
- Instructor tries to fit the response into your model
- If validation fails, Instructor captures the errors
- The errors are sent back to the LLM for a retry
- This continues until validation passes or max retries is reached
Adding Custom Error Messages¶
For clearer feedback, you can add custom error messages:
from pydantic import BaseModel, Field
class Product(BaseModel):
name: str
price: float = Field(
gt=0,
description="Product price in USD",
json_schema_extra={"error_msg": "Price must be greater than zero"}
)
Next Steps¶
- Learn about Custom Validators
- Explore Retry Mechanisms
- Try Field-level Validation