Recursive Schema Implementation Guide¶
This guide demonstrates how to work with recursive schemas in Instructor using Pydantic models. While flat schemas are often simpler to work with, some use cases require recursive structures to represent hierarchical data effectively.
Motivation
Recursive schemas are particularly useful when dealing with: * Nested organizational structures * File system hierarchies * Comment threads with replies * Task dependencies with subtasks * Abstract syntax trees
Defining a Recursive Schema¶
Here's an example of how to define a recursive Pydantic model:
from typing import List, Optional
from pydantic import BaseModel, Field
class RecursiveNode(BaseModel):
"""A node that can contain child nodes of the same type."""
name: str = Field(..., description="Name of the node")
value: Optional[str] = Field(
None, description="Optional value associated with the node"
)
children: List["RecursiveNode"] = Field(
default_factory=list, description="List of child nodes"
)
# Required for recursive Pydantic models
RecursiveNode.model_rebuild()
Example Usage¶
Let's see how to use this recursive schema with Instructor:
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
def parse_hierarchy(text: str) -> RecursiveNode:
"""Parse text into a hierarchical structure."""
return client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are an expert at parsing text into hierarchical structures.",
},
{
"role": "user",
"content": f"Parse this text into a hierarchical structure: {text}",
},
],
response_model=RecursiveNode,
)
# Example usage
hierarchy = parse_hierarchy(
"""
Company: Acme Corp
- Department: Engineering
- Team: Frontend
- Project: Website Redesign
- Project: Mobile App
- Team: Backend
- Project: API v2
- Project: Database Migration
- Department: Marketing
- Team: Digital
- Project: Social Media Campaign
- Team: Brand
- Project: Logo Refresh
"""
)
Validation and Best Practices¶
When working with recursive schemas:
- Always call
model_rebuild()
after defining the model - Consider adding validation for maximum depth to prevent infinite recursion
- Use type hints properly to maintain code clarity
- Consider implementing custom validators for specific business rules
from pydantic import model_validator
class RecursiveNodeWithDepth(RecursiveNode):
@model_validator(mode='after')
def validate_depth(self) -> "RecursiveNodeWithDepth":
def check_depth(node: "RecursiveNodeWithDepth", current_depth: int = 0) -> int:
if current_depth > 10: # Maximum allowed depth
raise ValueError("Maximum depth exceeded")
return max(
[check_depth(child, current_depth + 1) for child in node.children],
default=current_depth,
)
check_depth(self)
return self
Performance Considerations¶
While recursive schemas are powerful, they can be more challenging for language models to handle correctly. Consider these tips:
- Keep structures as shallow as possible
- Use clear naming conventions
- Provide good examples in your prompts
- Consider breaking very large structures into smaller chunks
Conclusion¶
Recursive schemas provide a powerful way to handle hierarchical data structures in your applications. While they require more careful handling than flat schemas, they can be invaluable for certain use cases.
For more examples of working with complex data structures, check out: 1. Query Planning with Dependencies 2. Knowledge Graph Generation