Generating Structured Output / JSON from LLMs¶
Language models have seen significant growth. Using them effectively often requires complex frameworks. This post discusses how Instructor simplifies this process using Pydantic.
The Problem with Existing LLM Frameworks¶
Current frameworks for Language Learning Models (LLMs) have complex setups. Developers find it hard to control interactions with language models. Some frameworks require complex JSON Schema setups.
The OpenAI Function Calling Game-Changer¶
OpenAI's Function Calling feature provides a constrained interaction model. However, it has its own complexities, mostly around JSON Schema.
Why Pydantic?¶
Instructor uses Pydantic to simplify the interaction between the programmer and the language model.
- Widespread Adoption: Pydantic is a popular tool among Python developers.
- Simplicity: Pydantic allows model definition in Python.
- Framework Compatibility: Many Python frameworks already use Pydantic.
import pydantic
import instructor
from openai import OpenAI
# Enables the response_model
client = instructor.from_openai(OpenAI())
class UserDetail(pydantic.BaseModel):
name: str
age: int
def introduce(self):
return f"Hello I'm {self.name} and I'm {self.age} years old"
user: UserDetail = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
],
)
Simplifying Validation Flow with Pydantic¶
Pydantic validators simplify features like re-asking or self-critique. This makes these tasks less complex compared to other frameworks.
from typing_extensions import Annotated
from pydantic import BaseModel, BeforeValidator
from instructor import llm_validator
class QuestionAnswerNoEvil(BaseModel):
question: str
answer: Annotated[
str,
BeforeValidator(llm_validator("don't say objectionable things")),
]
The Modular Approach¶
Pydantic allows for modular output schemas. This leads to more organized code.
Composition of Schemas¶
Defining Relationships¶
class UserDetail(BaseModel):
id: int
age: int
name: str
friends: List[int]
class UserRelationships(BaseModel):
users: List[UserDetail]
Using Enums¶
from enum import Enum, auto
class Role(Enum):
PRINCIPAL = auto()
TEACHER = auto()
STUDENT = auto()
OTHER = auto()
class UserDetail(BaseModel):
age: int
name: str
role: Role
Flexible Schemas¶
from typing import List
class Property(BaseModel):
key: str
value: str
class UserDetail(BaseModel):
age: int
name: str
properties: List[Property]
Chain of Thought¶
class TimeRange(BaseModel):
chain_of_thought: str
start_time: int
end_time: int
class UserDetail(BaseModel):
id: int
age: int
name: str
work_time: TimeRange
leisure_time: TimeRange
Language Models as Microservices¶
The architecture resembles FastAPI. Most code can be written as Python functions that use Pydantic objects. This eliminates the need for prompt chains.
FastAPI Stub¶
import fastapi
from pydantic import BaseModel
class UserDetails(BaseModel):
name: str
age: int
app = fastapi.FastAPI()
@app.get("/user/{user_id}", response_model=UserDetails)
async def get_user(user_id: int) -> UserDetails:
return ...
Using Instructor as a Function¶
def extract_user(str) -> UserDetails:
return client.chat.completions(
response_model=UserDetails,
messages=[]
)
Response Modeling¶
Conclusion¶
Instructor, with Pydantic, simplifies interaction with language models. It is usable for both experienced and new developers.
If you enjoy the content or want to try out instructor
please check out the github and give us a star!