Batch Processing¶
Batch processing lets you send multiple requests in a single operation, saving up to 50% on costs. Instructor supports batch processing across multiple providers.
Supported Providers¶
| Provider | Models | Cost Savings |
|---|---|---|
| OpenAI | gpt-4o, gpt-4.1-mini, gpt-4-turbo | 50% |
| Anthropic | claude-3-5-sonnet, claude-3-opus, claude-3-haiku | 50% |
| Google GenAI | gemini-2.5-flash, gemini-2.0-flash, gemini-pro | 50% |
Basic Usage¶
from instructor.batch import BatchProcessor
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
processor = BatchProcessor("openai/gpt-4.1-mini", User)
messages_list = [
[
{"role": "system", "content": "Extract user information from text."},
{"role": "user", "content": "Hi, I'm Alice and I'm 28 years old."},
],
[
{"role": "system", "content": "Extract user information from text."},
{"role": "user", "content": "Hello, I'm Bob, 35 years old."},
],
]
# Create batch file
processor.create_batch_from_messages(
file_path="batch_requests.jsonl",
messages_list=messages_list,
max_tokens=200,
temperature=0.1,
)
# Submit batch job
batch_id = processor.submit_batch("batch_requests.jsonl")
print(f"Batch job submitted: {batch_id}")
# Check status and retrieve results
status = processor.get_batch_status(batch_id)
if status['status'] in ['completed', 'ended', 'JOB_STATE_SUCCEEDED']:
from instructor.batch import filter_successful, extract_results
all_results = processor.retrieve_results(batch_id)
for user in extract_results(all_results):
print(f"Name: {user.name}, Age: {user.age}")
In-Memory Processing¶
For serverless deployments, use in-memory mode by setting file_path=None:
import time
from instructor.batch import BatchProcessor
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
processor = BatchProcessor("openai/gpt-4.1-mini", User)
messages_list = [
[{"role": "user", "content": "Extract: John is 25 years old"}],
[{"role": "user", "content": "Extract: Jane is 30 years old"}],
]
# Create in-memory buffer (no file_path)
buffer = processor.create_batch_from_messages(
messages_list,
file_path=None,
max_tokens=150,
)
# Submit and poll for results
batch_id = processor.submit_batch(buffer)
while True:
status = processor.get_batch_status(batch_id)
if status.get("status") in ["completed", "failed", "cancelled"]:
break
time.sleep(10)
if status.get("status") == "completed":
results = processor.get_results(batch_id)
for r in results:
if hasattr(r, "result"):
print(f"{r.result.name}, {r.result.age}")
When to Use Each Approach¶
| Use Case | Approach |
|---|---|
| Serverless (Lambda, Cloud Functions) | In-memory |
| Large batch jobs | File-based |
| Security-sensitive environments | In-memory |
| Debugging/audit requirements | File-based |
Provider Setup¶
OpenAI¶
Anthropic¶
Google GenAI¶
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GCS_BUCKET="your-gcs-bucket-name"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Required permissions: roles/aiplatform.user and roles/storage.objectUser.
Processing Results¶
Results use a Maybe/Result pattern for type-safe handling:
from instructor.batch import (
BatchProcessor,
filter_successful,
filter_errors,
extract_results,
get_results_by_custom_id,
)
all_results = processor.retrieve_results(batch_id)
# Filter by type
successful = filter_successful(all_results) # List[BatchSuccess[T]]
errors = filter_errors(all_results) # List[BatchError]
objects = extract_results(all_results) # List[T]
# Access by custom_id
by_id = get_results_by_custom_id(all_results)
if "request-1" in by_id:
result = by_id["request-1"]
if result.success:
print(f"Success: {result.result}")
else:
print(f"Error: {result.error_message}")
API Reference¶
| Method | Description |
|---|---|
create_batch_from_messages(messages_list, file_path=None, ...) | Create batch file or buffer |
submit_batch(file_path_or_buffer, metadata=None) | Submit batch job, returns job ID |
get_batch_status(batch_id) | Get job status |
retrieve_results(batch_id) | Download and parse results |
parse_results(content) | Parse raw results content |
CLI Commands¶
# List batch jobs
instructor batch list --model "openai/gpt-4.1-mini"
# Create batch from file
instructor batch create-from-file --file-path batch.jsonl --model "openai/gpt-4.1-mini"
# Get batch results
instructor batch results --batch-id "batch_abc123" --output-file results.jsonl
Best Practices¶
- Batch size: Include at least 25,000 requests per job for optimal efficiency
- Cost optimization: Use batch processing for non-urgent workloads
- Error handling: Always check both successful and error results
- Timeouts: Batch jobs have execution limits (24 hours for Google)
- Storage: For Google, ensure GCS bucket is in the same region as your batch job
Troubleshooting¶
| Issue | Solution |
|---|---|
| Missing GCS_BUCKET (Google) | Set the GCS_BUCKET environment variable |
| Permission Denied (Google) | Add aiplatform.user and storage.objectUser roles |
| Invalid Model Name | Use format provider/model-name |
| Authentication Error | Verify API keys are set correctly |