Skip to content

Prompt Templating

With Instructor's Jinja templating, you can:

  • Dynamically adapt prompts to any context
  • Easily manage and version your prompts better
  • Integrate seamlessly with validation processes
  • Handle sensitive information securely

Our solution offers:

  • Separation of prompt structure and content
  • Complex logic implementation within prompts
  • Template reusability across scenarios
  • Enhanced prompt versioning and logging
  • Pydantic integration for validation and type safety

Context is available to the templating engine

The context parameter is a dictionary that is passed to the templating engine. It is used to pass in the relevant variables to the templating engine. This single context parameter will be passed to jinja to render out the final prompt.

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """Extract the information from the
        following text: `{{ data }}`""",  # (1)!
        },
    ],
    response_model=User,
    context={"data": "John Doe is thirty years old"},  # (2)!
)

print(resp)
#> User(name='John Doe', age=30)
  1. Declare jinja style template variables inside the prompt itself (e.g. {{ name }})
  2. Pass in the variables to be used in the context parameter

Context is available to Pydantic validators

In this example, we demonstrate how to leverage the context parameter with Pydantic validators to enhance our validation and data processing capabilities. By passing the context to the validators, we can implement dynamic validation rules and data transformations based on the input context. This approach allows for flexible and context-aware validation, such as checking for banned words or applying redaction patterns to sensitive information.

import openai
import instructor
from pydantic import BaseModel, ValidationInfo, field_validator
import re

client = instructor.from_openai(openai.OpenAI())


class Response(BaseModel):
    text: str

    @field_validator('text')
    @classmethod
    def redact_regex(cls, v: str, info: ValidationInfo):
        context = info.context
        if context:
            redact_patterns = context.get('redact_patterns', [])
            for pattern in redact_patterns:
                v = re.sub(pattern, '****', v)
        return v


response = client.create(
    model="gpt-4o",
    response_model=Response,
    messages=[
        {
            "role": "user",
            "content": """
                Write about a {{ topic }}

                {% if banned_words %}
                You must not use the following banned words:

                <banned_words>
                {% for word in banned_words %}
                * {{ word }}
                {% endfor %}
                </banned_words>
                {% endif %}
              """,
        },
    ],
    context={
        "topic": "jason and now his phone number is 123-456-7890",
        "redact_patterns": [
            r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",  # Phone number pattern
            r"\b\d{3}-\d{2}-\d{4}\b",  # SSN pattern
        ],
    },
    max_retries=3,
)

print(response.text)
#> While i can't say his name anymore, his phone number is ****
  1. Access the variables passed into the context variable inside your Pydantic validator

  2. Pass in the variables to be used for validation and/or rendering into the context parameter

Jinja Syntax

Jinja is used to render the prompts, allowing the use of familiar Jinja syntax. This enables rendering of lists, conditionals, and more. It also allows calling functions and methods within Jinja.

This makes formatting of prompts and rendering logic extremely easy.

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())


class Citation(BaseModel):
    source_ids: list[int]
    text: str


class Response(BaseModel):
    answer: list[Citation]


resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """
                You are a {{ role }} tasks with the following question

                <question>
                {{ question }}
                </question>

                Use the following context to answer the question, make sure to return [id] for every citation:

                <context>
                {% for chunk in context %}
                  <context_chunk>
                    <id>{{ chunk.id }}</id>
                    <text>{{ chunk.text }}</text>
                  </context_chunk>
                {% endfor %}
                </context>

                {% if rules %}
                Make sure to follow these rules:

                {% for rule in rules %}
                  * {{ rule }}
                {% endfor %}
                {% endif %}
            """,
        },
    ],
    response_model=Response,
    context={
        "role": "professional educator",
        "question": "What is the capital of France?",
        "context": [
            {"id": 1, "text": "Paris is the capital of France."},
            {"id": 2, "text": "France is a country in Europe."},
        ],
        "rules": ["Use markdown."],
    },
)

print(resp)
# answer=[Citation(source_ids=[1], text='The capital of France is Paris.')]

Working with Secrets

Your prompts might need to include sensitive user information when they're sent to your model provider. This is probably something you don't want to hard code into your prompt or captured in your logs. An easy way to get around this is to use the SecretStr type from Pydantic in your model definitions.

from pydantic import BaseModel, SecretStr
import instructor
import openai


class UserContext(BaseModel):
    name: str
    address: SecretStr


class Address(BaseModel):
    street: SecretStr
    city: str
    state: str
    zipcode: str


client = instructor.from_openai(openai.OpenAI())
context = UserContext(name="scolvin", address="secret address")

address = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "{{ user.name }} is `{{ user.address.get_secret_value() }}`, normalize it to an address object",
        },
    ],
    context={"user": context},
    response_model=Address,
)
print(context)
#> UserContext(username='jliu', address="******")
print(address)
#> Address(street='******', city="Toronto", state="Ontario", zipcode="M5A 0J3")

This allows you to preserve your sensitive information while still using it in your prompts.

Security

We use the jinja2.sandbox.SandboxedEnvironment to prevent security issues with the templating engine. This means that you can't use arbitrary python code in your prompts. But this doesn't mean that you should pass untrusted input to the templating engine, as this could still be abused for things like Denial of Service attacks.

You should always sanitize any input that you pass to the templating engine.