Skip to content

Validation and Reasking

Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the system can use to self-correct.

Pydantic

Pydantic offers an customizable and expressive validation framework for Python. Instructor leverages Pydantic's validation framework to provide a uniform developer experience for both code-based and LLM-based validation, as well as a reasking mechanism for correcting LLM outputs based on validation errors. To learn more check out the Pydantic docs on validators.

Good llm validation is just good validation

If you want to see some more examples on validators checkout our blog post Good LLM validation is just good validation

Code-based Validation Example

First define a Pydantic model with a validator using the Annotation class from typing_extensions.

Enforce a naming rule using Pydantic's built-in validation:

from pydantic import BaseModel, ValidationError
from typing_extensions import Annotated
from pydantic import AfterValidator


def name_must_contain_space(v: str) -> str:
    if " " not in v:
        raise ValueError("Name must contain a space.")
    return v.lower()


class UserDetail(BaseModel):
    age: int
    name: Annotated[str, AfterValidator(name_must_contain_space)]


try:
    person = UserDetail(age=29, name="Jason")
except ValidationError as e:
    print(e)
    """
    1 validation error for UserDetail
    name
      Value error, Name must contain a space. [type=value_error, input_value='Jason', input_type=str]
        For further information visit https://errors.pydantic.dev/2.7/v/value_error
    """

Output for Code-Based Validation

1 validation error for UserDetail
name
   Value error, name must contain a space (type=value_error)

As we can see, Pydantic raises a validation error when the name attribute does not contain a space. This is a simple example, but it demonstrates how Pydantic can be used to validate attributes of a model.

LLM-Based Validation Example

LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.

import instructor
from openai import OpenAI
from instructor import llm_validator
from pydantic import BaseModel, ValidationError, BeforeValidator
from typing_extensions import Annotated


# Apply the patch to the OpenAI client
client = instructor.from_openai(OpenAI())


class QuestionAnswer(BaseModel):
    question: str
    answer: Annotated[
        str,
        BeforeValidator(llm_validator("don't say objectionable things", client=client)),
    ]


try:
    qa = QuestionAnswer(
        question="What is the meaning of life?",
        answer="The meaning of life is to be evil and steal",
    )
except ValidationError as e:
    print(e)
    """
    1 validation error for QuestionAnswer
    answer
      Assertion failed, The statement promotes objectionable behavior by encouraging evil and stealing. [type=assertion_error, input_value='The meaning of life is to be evil and steal', input_type=str]
        For further information visit https://errors.pydantic.dev/2.7/v/assertion_error
    """

Output for LLM-Based Validation

It is important to note here that the error message is generated by the LLM, not the code, so it'll be helpful for re-asking the model.

1 validation error for QuestionAnswer
answer
   Assertion failed, The statement is objectionable. (type=assertion_error)

Using Reasking Logic to Correct Outputs

Validators are a great tool for ensuring some property of the outputs. When you use the patch() method with the openai client, you can use the max_retries parameter to set the number of times you can reask the model to correct the output.

It is a great layer of defense against bad outputs of two forms:

  1. Pydantic Validation Errors (code or llm based)
  2. JSON Decoding Errors (when the model returns a bad response)

Step 1: Define the Response Model with Validators

Notice that the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a ValueError if the name is not in uppercase.

import openai
import instructor
from pydantic import BaseModel, field_validator

# Apply the patch to the OpenAI client
client = instructor.from_openai(openai.OpenAI())


class UserDetails(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def validate_name(cls, v):
        if v.upper() != v:
            raise ValueError("Name must be in uppercase.")
        return v

Step 2. Using the Client with Retries

Here, the UserDetails model is passed as the response_model, and max_retries is set to 2.

import instructor
import openai
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI(), mode=instructor.Mode.TOOLS)


class UserDetails(BaseModel):
    name: str
    age: int


model = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetails,
    max_retries=2,
    messages=[
        {"role": "user", "content": "Extract jason is 25 years old"},
    ],
)

print(model.model_dump_json(indent=2))
"""
{
  "name": "Jason",
  "age": 25
}
"""

What happens behind the scenes?

Behind the scenes, the instructor.from_openai() method adds a max_retries parameter to the openai.ChatCompletion.create() method. The max_retries parameter will trigger up to 2 reattempts if the name attribute fails the uppercase validation in UserDetails.

from pydantic import ValidationError


try:
    ...
except ValidationError as e:
    kwargs["messages"].append(response.choices[0].message)
    kwargs["messages"].append(
        {
            "role": "user",
            "content": f"Please correct the function call; errors encountered:\n{e}",
        }
    )

Advanced Validation Techniques

The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better such as model level validation, and using a validation context. Check out our example on verifying citations which covers:

  1. Validate the entire object with all attributes rather than one attribute at a time
  2. Using some 'context' to validate the object: In this case, we use the context to check if the citation existed in the original text.

Optimizing Token usage

Pydantic automatically includes a URL within the error message itself when an error is thrown so that users can learn more about the specific error that was thrown. Some users might want to remove this URL since it adds extra tokens that otherwise might not add much value to the validation process.

We've created a small helper function that you can use below which removes this url in the error message

from instructor.utils import disable_pydantic_error_url
from pydantic import BaseModel, ValidationError
from typing_extensions import Annotated
from pydantic import AfterValidator

disable_pydantic_error_url()  # (1)!


def name_must_contain_space(v: str) -> str:
    if " " not in v:
        raise ValueError("Name must contain a space.")
    return v.lower()


class UserDetail(BaseModel):
    age: int
    name: Annotated[str, AfterValidator(name_must_contain_space)]


try:
    person = UserDetail(age=29, name="Jason")
except ValidationError as e:
    print(e)
    """
    1 validation error for UserDetail
    name
        Value error, Name must contain a space. [type=value_error, input_value='Jason', input_type=str]
    """
  1. We disable the error by setting an environment variable PYDANTIC_ERRORS_INCLUDE_URL to 0. This is valid only for the duration that the script is executed for, once the function is not called, the original behaviour is restored.

Takeaways

By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content, but also pave the way for more autonomous and effective systems.