Structured outputs with Cortex¶
Cortex.cpp is a runtime that helps you run open source LLMs out of the box. It supports a wide variety of models and powers their Jan platform. This guide provides a quickstart on how to use Cortex with instructor for structured outputs.
Quick Start¶
Instructor comes with support for the OpenAI client out of the box, so you don't need to install anything extra.
Once you've done so, make sure to pull the model that you'd like to use. In this example, we'll be using a quantized llama3.2 model.
Let's start by initializing the client below - note that we need to provide a base URL and an API key here. The API key isn't important, it's just so the OpenAI client doesn't throw an error.
import os
from openai import OpenAI
client = from_openai(
openai.OpenAI(
base_url="http://localhost:39281/v1",
api_key="this is a fake api key that doesn't matter",
)
)
Simple User Example (Sync)¶
from instructor import from_openai
from pydantic import BaseModel
import openai
client = from_openai(
openai.OpenAI(
base_url="http://localhost:39281/v1",
api_key="this is a fake api key that doesn't matter",
)
)
class User(BaseModel):
name: str
age: int
resp = client.chat.completions.create(
model="llama3.2:3b-gguf-q4-km",
messages=[{"role": "user", "content": "Ivan is 27 and lives in Singapore"}],
response_model=User,
)
print(resp)
# > name='Ivan', age=27
Simple User Example (Async)¶
import os
from openai import AsyncOpenAI
import instructor
from pydantic import BaseModel
import asyncio
# Initialize with API key
client = from_openai(
openai.AsyncOpenAI(
base_url="http://localhost:39281/v1",
api_key="this is a fake api key that doesn't matter",
)
)
class User(BaseModel):
name: str
age: int
async def extract_user():
user = await client.chat.completions.create(
model="llama3.2:3b-gguf-q4-km",
messages=[
{"role": "user", "content": "Extract: Jason is 25 years old"},
],
response_model=User,
)
return user
# Run async function
user = asyncio.run(extract_user())
print(user)
#> User(name='Jason', age=25)
Nested Example¶
from instructor import from_openai
from pydantic import BaseModel
import openai
client = from_openai(
openai.OpenAI(
base_url="http://localhost:39281/v1",
api_key="this is a fake api key that doesn't matter",
)
)
class Address(BaseModel):
street: str
city: str
country: str
class User(BaseModel):
name: str
age: int
addresses: list[Address]
user = client.chat.completions.create(
model="llama3.2:3b-gguf-q4-km",
messages=[
{
"role": "user",
"content": """
Extract: Jason is 25 years old.
He lives at 123 Main St, New York, USA
and has a summer house at 456 Beach Rd, Miami, USA
""",
},
],
response_model=User,
)
print(user)
#> {
#> 'name': 'Jason',
#> 'age': 25,
#> 'addresses': [
#> {
#> 'street': '123 Main St',
#> 'city': 'New York',
#> 'country': 'USA'
#> },
#> {
#> 'street': '456 Beach Rd',
#> 'city': 'Miami',
#> 'country': 'USA'
#> }
#> ]
#> }
In this tutorial we've seen how we can run local models with Cortex while simplifying a lot of the logic around managing retries and function calling with our simple interface.
We'll be publishing a lot more content on Cortex and how to work with local models moving forward so do keep an eye out for that.
Related Resources¶
Updates and Compatibility¶
Instructor maintains compatibility with the latest OpenAI API versions and models. Check the changelog for updates.