Structured Output for Open Source and Local LLMs¶
Instructor has expanded its capabilities for language models. It started with API interactions via the OpenAI SDK, using Pydantic for structured data validation. Now, Instructor supports multiple models and platforms.
The integration of JSON mode improved adaptability to vision models and open source alternatives. This allows support for models from GPT and Mistral to models on Ollama and Hugging Face, using llama-cpp-python.
Instructor now works with cloud-based APIs and local models for structured data extraction. Developers can refer to our guide on Patching for information on using JSON mode with different models.
For learning about Instructor and Pydantic, we offer a course on Steering language models towards structured outputs.
The following sections show examples of Instructor's integration with platforms and local setups for structured outputs in AI projects.
Exploring Different OpenAI Clients with Instructor¶
OpenAI clients offer functionalities for different needs. We explore clients integrated with Instructor, providing structured outputs and capabilities. Examples show how to initialize and patch each client.
Local Models¶
Ollama: A New Frontier for Local Models¶
Ollama enables structured outputs with local models using JSON schema. See our Ollama documentation for details.
For setup and features, refer to the documentation. The Ollama website provides resources, models, and support.
from openai import OpenAI
from pydantic import BaseModel
import instructor
class UserDetail(BaseModel):
name: str
age: int
# enables `response_model` in create call
client = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama", # required, but unused
),
mode=instructor.Mode.JSON,
)
user = client.chat.completions.create(
model="llama2",
messages=[
{
"role": "user",
"content": "Jason is 30 years old",
}
],
response_model=UserDetail,
)
print(user)
#> name='Jason' age=30
llama-cpp-python¶
llama-cpp-python provides the llama-cpp
model for structured outputs using JSON schema. It uses constrained sampling and speculative decoding. An OpenAI compatible client allows in-process structured output without network dependency.
Example of using llama-cpp-python for structured outputs:
import llama_cpp
import instructor
from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
from pydantic import BaseModel
llama = llama_cpp.Llama(
model_path="../../models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
n_gpu_layers=-1,
chat_format="chatml",
n_ctx=2048,
draft_model=LlamaPromptLookupDecoding(num_pred_tokens=2),
logits_all=True,
verbose=False,
)
create = instructor.patch(
create=llama.create_chat_completion_openai_v1,
mode=instructor.Mode.JSON_SCHEMA,
)
class UserDetail(BaseModel):
name: str
age: int
user = create(
messages=[
{
"role": "user",
"content": "Extract `Jason is 30 years old`",
}
],
response_model=UserDetail,
)
print(user)
#> name='Jason' age=30
Alternative Providers¶
Groq¶
Groq's platform, detailed further in our Groq documentation and on Groq's official documentation, offers a unique approach to processing with its tensor architecture. This innovation significantly enhances the performance of structured output processing.
import os
from pydantic import BaseModel
import groq
import instructor
client = groq.Groq(
api_key=os.environ.get("GROQ_API_KEY"),
)
# By default, the patch function will patch the ChatCompletion.create and ChatCompletion.create methods
# to support the response_model parameter
client = instructor.from_openai(client, mode=instructor.Mode.MD_JSON)
# Now, we can use the response_model parameter using only a base model
# rather than having to use the OpenAISchema class
class UserExtract(BaseModel):
name: str
age: int
user: UserExtract = client.chat.completions.create(
model="mixtral-8x7b-32768",
response_model=UserExtract,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert isinstance(user, UserExtract), "Should be instance of UserExtract"
print(user)
#> name='jason' age=25
Together AI¶
Together AI, when combined with Instructor, offers a seamless experience for developers looking to leverage structured outputs in their applications. For more details, refer to our Together AI documentation and explore the patching guide to enhance your applications.
import os
from pydantic import BaseModel
import instructor
import openai
client = openai.OpenAI(
base_url="https://api.together.xyz/v1",
api_key=os.environ["TOGETHER_API_KEY"],
)
client = instructor.from_openai(client, mode=instructor.Mode.TOOLS)
class UserExtract(BaseModel):
name: str
age: int
user: UserExtract = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
response_model=UserExtract,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert isinstance(user, UserExtract), "Should be instance of UserExtract"
print(user)
#> name='jason' age=25
Mistral¶
For those interested in exploring the capabilities of Mistral Large with Instructor, we highly recommend checking out our comprehensive guide on Mistral Large.
import instructor
from pydantic import BaseModel
from mistralai.client import MistralClient
client = MistralClient()
patched_chat = instructor.from_openai(
create=client.chat, mode=instructor.Mode.MISTRAL_TOOLS
)
class UserDetails(BaseModel):
name: str
age: int
resp = patched_chat(
model="mistral-large-latest",
response_model=UserDetails,
messages=[
{
"role": "user",
"content": f'Extract the following entities: "Jason is 20"',
},
],
)
print(resp)
#> name='Jason' age=20