Skip to content

LLM Techniques

Bad Schemas could break your LLM Structured Outputs

You might be leaving up to 60% performance gains on the table with the wrong response model. Response Models impact model performance massively with Claude and GPT-4o, irregardless of you’re using JSON mode or Tool Calling.

Using the right response model can help ensure your models respond in the right language or prevent hallucinations when extracting video timestamps.

We decided to investigate this by benchmarking Claude and GPT-4o on the GSM8k dataset and found that

  1. Field Naming drastically impacts performance - Changing a single field name from final_choice to answer improved model accuracy from 4.5% to 95%. The way we structure and name fields in our response models can fundamentally alter how the model interprets and responds to queries.
  2. Chain Of Thought significantly boosts performance - Adding a reasoning field increased model accuracy by 60% on the GSM8k dataset. Models perform significantly better when they explain their logic step-by-step.
  3. Be careful with JSON mode - JSON mode exhibited 50% more performance variation than Tool Calling when renaming fields. Different response models showed varying levels of performance between JSON mode and Tool Calling, indicating that JSON mode requires more careful optimisation.

Instructor Proposal: Integrating Jinja Templating

As the creator of Instructor, I've always aimed to keep our product development streamlined and avoid unnecessary complexity. However, I'm now convinced that it's time to incorporate better templating into our data structure, specifically by integrating Jinja.

This decision serves multiple purposes:

  1. It addresses the growing complexity in my prompt formatting needs
  2. It allows us to differentiate ourselves from the standard library while adding proven utility.
  3. It aligns with the practices I've consistently employed in both production and client code.
  4. It provides an opportunity to introduce API changes that have been tested in private versions of Instructor.

Why Jinja is the Right Choice

  1. Formatting Capabilities
  2. Prompt formatting complexity has increased.
  3. List iteration and conditional implementation are necessary for formatting.
  4. This improves chunk generation, few shots, and dynamic rules.

  5. Validation

  6. Jinja template variables serve rendering and validation purposes.
  7. Pydantic's validation context allows access to template variables in validation functions.

  8. Versioning and Logging

  9. Render variable separation enhances prompt versioning and logging.
  10. Template variable diffing simplifies prompt change comparisons.

By integrating Jinja into Instructor, we're not just adding a feature; we're enhancing our ability to handle complex formatting, improve validation processes, and streamline our versioning and logging capabilities. This addition will significantly boost the power and flexibility of Instructor, making it an even more robust tool for our users.

Enhancing Formatting Capabilities

In Instructor, we propose implementing a new context keyword in our create methods. This addition will allow users to render the prompt using a provided context, leveraging Jinja's templating capabilities. Here's how it would work:

  1. Users pass a context dictionary to the create method.
  2. The prompt template, written in Jinja syntax, is defined in the content field of the message.
  3. Instructor renders the prompt using the provided context, filling in the template variables.

This approach offers these benefits:

  • Separation of prompt structure and dynamic content
  • Management of complex prompts with conditionals and loops
  • Reusability of prompt templates across different contexts

Let's look at an example to illustrate this feature:

client.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user", 
            "content": """
                You are a {{ role }} tasks with the following question 

                <question>
                {{ question }}
                </question>

                Use the following context to answer the question, make sure to return [id] for every citation:

                <context>
                {% for chunk in context %}
                  <context_chunk>
                    <id>{{ chunk.id }}</id>
                    <text>{{ chunk.text }}</text>
                  </context_chunk>
                {% endfor %}
                </context>

                {% if rules %}
                Make sure to follow these rules:

                {% for rule in rules %}
                  * {{ rule }}
                {% endfor %}
                {% endif %}
            """
        },
    ],
    context={
        "role": "professional educator", 
        "question": "What is the capital of France?", 
        "context": [
            {"id": 1, "text": "Paris is the capital of France."}, 
            {"id": 2, "text": "France is a country in Europe."}
        ], 
        "rules": ["Use markdown."]
    }
)

Validation

Let's consider a scenario where we redact words from text. By using ValidationInfo to access context and passing it to the validator and template, we can implement a system for handling sensitive information. This approach allows us to:

  1. Validate input to ensure it doesn't contain banned words.
  2. Redact patterns using regular expressions.
  3. Provide instructions to the language model about word usage restrictions.

Here's an example demonstrating this concept using Pydantic validators:

from pydantic import BaseModel, ValidationInfo, field_validator

class Response(BaseModel):
    text: str

    @field_validator('text')
    @classmethod
    def no_banned_words(cls, v: str, info: ValidationInfo):
        context = info.context
        if context:
            banned_words = context.get('banned_words', set())
            banned_words_found = [word for word in banned_words if word.lower() in v.lower()]
            if banned_words_found:
                raise ValueError(f"Banned words found in text: {', '.join(banned_words_found)}, rewrite it but just without the banned words")
        return v

    @field_validator('text')
    @classmethod
    def redact_regex(cls, v: str, info: ValidationInfo):
        context = info.context
        if context:
            redact_patterns = context.get('redact_patterns', [])
            for pattern in redact_patterns:
                v = re.sub(pattern, '****', v)
        return v

response = client.create(
    model="gpt-4o",
    response_model=Response,
    messages=[
        {
            "role": "user", 
            "content": """
                Write about a {{ topic }}

                {% if banned_words %}
                You must not use the following banned words:

                <banned_words>
                {% for word in banned_words %}
                * {{ word }}
                {% endfor %}
                </banned_words>
                {% endif %}
              """
        },
    ],
    context={
        "topic": "jason and now his phone number is 123-456-7890"
        "banned_words": ["jason"],
        "redact_patterns": [
            r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",  # Phone number pattern
            r"\b\d{3}-\d{2}-\d{4}\b",          # SSN pattern
        ],
    },
    max_retries=3,
)

print(response.text)
# > While i can't say his name anymore, his phone number is ****

Better Versioning and Logging

With the separation of prompt templates and variables, we gain several advantages:

  1. Version Control: We can now version the templates and retrieve the appropriate one for a given prompt. This allows for better management of template history, diffing and comparison.

  2. Enhanced Logging: The separation facilitates structured logging, enabling easier debugging and integration with various logging sinks, databases, and observability tools like OpenTelemetry.

  3. Security: Sensitive information in variables can be handled separately from the templates, allowing for better access control and data protection.

This separation of concerns adheres to best practices in software design, resulting in a more maintainable, scalable, and robust system for managing prompts and their associated data.

Side effect of Context also being Pydantic Models

Since they are just python objects we can use Pydantic models to validate the context and also control how they are rendered, so even secret information can be dynamically rendered! Consider using secret string to pass in sensitive information to the llm.

from pydantic import BaseModel, SecretStr

class UserContext(BaseModel):
    name: str
    address: SecretStr

class Address(BaseModel):
    street: SecretStr
    city: str
    state: str
    zipcode: str

def normalize_address(address: Address):
  context = UserContext(username="scolvin", address=address)
  address = client.create(
      model="gpt-4o",
      messages=[
          {
              "role": "user", 
              "content": "{{ user.name }} is `{{ user.address.get_secret_value() }}`, normalize it to an address object"
          },
      ],
      context={"user": context},
  )
  print(context)
  # > UserContext(username='jliu', address="******")
  print(address)
  # > Address(street='******', city="Toronto", state="Ontario", zipcode="M5A 0J3")
  logger.info(f"Normalized address: {address}", extra={"user_context": context, "address": address})
  return address

This approach offers several advantages:

  1. Secure logging: You can confidently log your template variables without risking the exposure of sensitive information.
  2. Type safety: Pydantic models provide type checking and validation, reducing the risk of errors.
  3. Flexibility: You can easily control how different types of data are displayed or used in templates.

Structured Outputs for Gemini now supported

We're excited to announce that instructor now supports structured outputs using tool calling for both the Gemini SDK and the VertexAI SDK.

A special shoutout to Sonal for his contributions to the Gemini Tool Calling support.

Let's walk through a simple example of how to use these new features

Installation

To get started, install the latest version of instructor. Depending on whether you're using Gemini or VertexAI, you should install the following:

pip install "instructor[google-generativeai]"
pip install "instructor[vertexai]"

This ensures that you have the necessary dependencies to use the Gemini or VertexAI SDKs with instructor.

We recommend using the Gemini SDK over the VertexAI SDK for two main reasons.

  1. Compared to the VertexAI SDK, the Gemini SDK comes with a free daily quota of 1.5 billion tokens to use for developers.
  2. The Gemini SDK is significantly easier to setup, all you need is a GOOGLE_API_KEY that you can generate in your GCP console. THe VertexAI SDK on the other hand requires a credentials.json file or an OAuth integration to use.

Getting Started

With our provider agnostic API, you can use the same interface to interact with both SDKs, the only thing that changes here is how we initialise the client itself.

Before running the following code, you'll need to make sure that you have your Gemini API Key set in your shell under the alias GOOGLE_API_KEY.

import instructor
import google.generativeai as genai
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest", # (1)!
    )
)

resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

print(resp)
#> name='Jason' age=25
  1. Current Gemini models that support tool calling are gemini-1.5-flash-latest and gemini-1.5-pro-latest.

We can achieve a similar thing with the VertexAI SDK. For this to work, you'll need to authenticate to VertexAI.

There are some instructions here but the easiest way I found was to simply download the GCloud cli and run gcloud auth application-default login.

import instructor
import vertexai  # type: ignore
from vertexai.generative_models import GenerativeModel  # type: ignore
from pydantic import BaseModel

vertexai.init()


class User(BaseModel):
    name: str
    age: int


client = instructor.from_vertexai(
    client=GenerativeModel("gemini-1.5-pro-preview-0409"), # (1)!
)


resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

print(resp)
#> name='Jason' age=25
  1. Current Gemini models that support tool calling are gemini-1.5-flash-latest and gemini-1.5-pro-latest.

Why Instructor is the best way to get JSON from LLMs

Large Language Models (LLMs) like GPT are incredibly powerful, but getting them to return well-formatted JSON can be challenging. This is where the Instructor library shines. Instructor allows you to easily map LLM outputs to JSON data using Python type annotations and Pydantic models.

Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including Mistral/Mixtral, Anyscale, Ollama, and llama-cpp-python.

It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage validation context, retries with Tenacity, and streaming Lists and Partial responses.

The Simple Patch for JSON LLM Outputs

Instructor works as a lightweight patch over the OpenAI Python SDK. To use it, you simply apply the patch to your OpenAI client:

import instructor
import openai

client = instructor.from_openai(openai.OpenAI())

Then, you can pass a response_model parameter to the completions.create or chat.completions.create methods. This parameter takes in a Pydantic model class that defines the JSON structure you want the LLM output mapped to. Just like response_model when using FastAPI.

Here's an example of a response_model for a simple user profile:

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    email: str

client = instructor.from_openai(openai.OpenAI())

user = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=User,
    messages=[
        {
            "role": "user",
            "content": "Extract the user's name, age, and email from this: John Doe is 25 years old. His email is [email protected]"
        }
    ]
)

print(user.model_dump())
# > { 
#     "name": "John Doe",
#     "age": 25,
#     "email": "[email protected]"
#   }

Instructor extracts the JSON data from the LLM output and returns an instance of your specified Pydantic model. You can then use the model_dump() method to serialize the model instance to a JSON string.

Some key benefits of Instructor:

  • Zero new syntax to learn - it builds on standard Python type hints
  • Seamless integration with existing OpenAI SDK code
  • Incremental, zero-overhead adoption path
  • Direct access to the messages parameter for flexible prompt engineering
  • Broad compatibility with any OpenAI SDK-compatible platform or provider

Pydantic: More Powerful than Plain Dictionaries

You might be wondering, why use Pydantic models instead of just returning a dictionary of key-value pairs? While a dictionary could hold JSON data, Pydantic models provide several powerful advantages:

  1. Type validation: Pydantic models enforce the types of the fields. If the LLM returns an incorrect type (e.g. a string for an int field), it will raise a validation error.

  2. Field requirements: You can mark fields as required or optional. Pydantic will raise an error if a required field is missing.

  3. Default values: You can specify default values for fields that aren't always present.

  4. Advanced types: Pydantic supports more advanced field types like dates, UUIDs, URLs, lists, nested models, and more.

  5. Serialization: Pydantic models can be easily serialized to JSON, which is helpful for saving results or passing them to other systems.

  6. IDE support: Because Pydantic models are defined as classes, IDEs can provide autocompletion, type checking, and other helpful features when working with the JSON data.

So while dictionaries can work for very simple JSON structures, Pydantic models are far more powerful for working with complex, validated JSON in a maintainable way.

JSON from LLMs Made Easy

Instructor and Pydantic together provide a fantastic way to extract and work with JSON data from LLMs. The lightweight patching of Instructor combined with the powerful validation and typing of Pydantic models makes it easy to integrate JSON outputs into your LLM-powered applications. Give Instructor a try and see how much easier it makes getting JSON from LLMs!

Enhancing RAG with Time Filters Using Instructor

Retrieval-augmented generation (RAG) systems often need to handle queries with time-based constraints, like "What new features were released last quarter?" or "Show me support tickets from the past week." Effective time filtering is crucial for providing accurate, relevant responses.

Instructor is a Python library that simplifies integrating large language models (LLMs) with data sources and APIs. It allows defining structured output models using Pydantic, which can be used as prompts or to parse LLM outputs.

Modeling Time Filters

To handle time filters, we can define a Pydantic model representing a time range:

from datetime import datetime
from typing import Optional
from pydantic import BaseModel

class TimeFilter(BaseModel):
    start_date: Optional[datetime] = None
    end_date: Optional[datetime] = None

The TimeFilter model can represent an absolute date range or a relative time range like "last week" or "previous month."

We can then combine this with a search query string:

class SearchQuery(BaseModel):
    query: str
    time_filter: TimeFilter

Prompting the LLM

Using Instructor, we can prompt the LLM to generate a SearchQuery object based on the user's query:

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    response_model=SearchQuery,
    messages=[
        {
            "role": "system", 
            "content": "You are a query generator for customer support tickets. The current date is 2024-02-17"},
        {
            "role": "user", 
            "content": "Show me customer support tickets opened in the past week."
        },
    ],
)

{
    "query": "Show me customer support tickets opened in the past week.",
    "time_filter": {
        "start_date": "2024-02-10T00:00:00",
        "end_date": "2024-02-17T00:00:00"
    }
}

Nuances in dates and timezones

When working with time-based queries, it's important to consider the nuances of dates, timezones, and publication times. Depending on the data source, the user's location, and when the content was originally published, the definition of "past week" or "last month" may vary.

To handle this, you'll want to design your TimeFilter model to intelligently reason about these relative time periods. This could involve:

  • Defaulting to the user's local timezone if available, or using a consistent default like UTC
  • Defining clear rules for how to calculate the start and end of relative periods like "week" or "month"
  • e.g. does "past week" mean the last 7 days or the previous Sunday-Saturday range?
  • Allowing for flexibility in how users specify dates (exact datetimes, just dates, natural language phrases)
  • Validating and normalizing user input to fit the expected TimeFilter format
  • Considering the original publication timestamp of the content, not just the current date
  • e.g. "articles published in the last month" should look at the publish date, not the query date

By building this logic into the TimeFilter model, you can abstract away the complexity and provide a consistent interface for the rest of your RAG system to work with standardized absolute datetime ranges

Of course, there may be edge cases or ambiguities that are hard to resolve programmatically. In these situations, you may need to prompt the user for clarification or make a best guess based on the available information. The key is to strive for a balance of flexibility and consistency in how you handle time-based queries, factoring in publication dates when relevant.

By modeling time filters with Pydantic and leveraging Instructor, RAG systems can effectively handle time-based queries. Clear prompts, careful model design, and appropriate parsing strategies enable accurate retrieval of information within specific time frames, enhancing the system's overall relevance and accuracy.

Seamless Support with Langsmith

Its a common misconception that LangChain's LangSmith is only compatible with LangChain's models. In reality, LangSmith is a unified DevOps platform for developing, collaborating, testing, deploying, and monitoring LLM applications. In this blog we will explore how LangSmith can be used to enhance the OpenAI client alongside instructor.

Generators and LLM Streaming

Latency is crucial, especially in eCommerce and newer chat applications like ChatGPT. Streaming is the solution that enables us to enhance the user experience without the need for faster response times.

And what makes streaming possible? Generators!