Skip to content

Structured Outputs with Google's genai SDK

Recommended SDK

The genai SDK is Google's recommended Python client for working with Gemini models. It provides a unified interface for both the Gemini API and Vertex AI. For detailed setup instructions, including how to use it with Vertex AI, please refer to the official Google AI documentation for the GenAI SDK.

This guide demonstrates how to use Instructor with Google's genai SDK to extract structured data from Gemini models.

We currently have two modes for Gemini

  • Mode.GENAI_TOOLS : This leverages function calling under the hood and returns a structured response
  • Mode.GENAI_STRUCTURED_OUTPUTS : This provides Gemini with a JSON Schema that it will use to respond in a structured format with

Installation

pip install "instructor[google-genai]"

Basic Usage

Getting started with Instructor and the genai SDK is straightforward. Just create a Pydantic model defining your output structure, patch the genai client, and make your request with a response_model parameter:

from google import genai
import instructor
from pydantic import BaseModel

# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int

# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Extract structured data
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
    response_model=User,
)

print(response)  # User(name='Jason', age=25)

Message Formatting

Genai supports multiple message formats, and Instructor seamlessly works with all of them. This flexibility allows you to use whichever format is most convenient for your application:

from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types

# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int

# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Single string (converted to user message)
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages="Jason is 25 years old",
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Standard format
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        {"role": "user", "content": "Jason is 25 years old"}
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Using genai's Content type
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        genai.types.Content(
            role="user",
            parts=[genai.types.Part.from_text(text="Jason is 25 years old")]
        )
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

System Messages

System messages help set context and instructions for the model. With Gemini models, you can provide system messages in two different ways:

from google import genai
import instructor
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# As a parameter
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    system="Jason is 25 years old",
    messages=[{"role": "user", "content": "You are a data extraction assistant"}],
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Or as a message with role "system"
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        {"role": "system", "content": "Jason is 25 years old"},
        {"role": "user", "content": "You are a data extraction assistant"},
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

Template Variables

Template variables make it easy to reuse prompts with different values. This is particularly useful for dynamic content or when testing different inputs:

from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types


# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int


# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Single string (converted to user message)
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=["{{name}} is {{ age }} years old"],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

# Standard format
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "{{ name }} is {{ age }} years old"}],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

# Using genai's Content type
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        genai.types.Content(
            role="user",
            parts=[genai.types.Part.from_text(text="{{name}} is {{age}} years old")],
        )
    ],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

Validation and Retries

Instructor can automatically retry requests when validation fails, ensuring you get properly formatted data. This is especially helpful when enforcing specific data requirements:

from typing import Annotated
from pydantic import AfterValidator, BaseModel
import instructor
from google import genai


def uppercase_validator(v: str) -> str:
    if v.islower():
        raise ValueError("Name must be ALL CAPS")
    return v


class UserDetail(BaseModel):
    name: Annotated[str, AfterValidator(uppercase_validator)]
    age: int


client = instructor.from_genai(genai.Client())

response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Extract: jason is 25 years old"}],
    response_model=UserDetail,
    max_retries=3,
)

print(response)  # UserDetail(name='JASON', age=25)

Multimodal Capabilities

We've provided a few different sample files for you to use to test out these new features. All examples below use these files.

  • (Audio) : A Recording of the Original Gettysburg Address : gettysburg.wav
  • (Image) : An image of some blueberry plants image.jpg
  • (PDF) : A sample PDF file which contains a fake invoice invoice.pdf

Instructor provides a unified, provider-agnostic interface for working with multimodal inputs like images, PDFs, and audio files. With Instructor's multimodal objects, you can easily load media from URLs, local files, or base64 strings using a consistent API that works across different AI providers (OpenAI, Anthropic, Mistral, etc.).

Instructor handles all the provider-specific formatting requirements behind the scenes, ensuring your code remains clean and future-proof as provider APIs evolve.

Let's see how to use the Image, Audio and PDF classes.

Image Processing

Autodetect Images

For convenient handling of images, you can enable automatic image conversion using the autodetect_images parameter. When enabled, Instructor will automatically detect and convert file paths and HTTP URLs provided as strings into the appropriate format required by the Google GenAI SDK. This makes working with images seamless and straightforward. ( see examples below )

Instructor makes it easy to analyse and extract semantic information from images using the Gemini series of models. Click here to check if the model you'd like to use has vison capabilities.

Let's see an example below with the sample image above where we'll load it in using our from_url method.

Note that we support local files and base64 strings too with the from_path and the from_base64 class methods.

from instructor.multimodal import Image
from pydantic import BaseModel, Field
import instructor
from google.genai import Client


class ImageDescription(BaseModel):
    objects: list[str] = Field(..., description="The objects in the image")
    scene: str = Field(..., description="The scene of the image")
    colors: list[str] = Field(..., description="The colors in the image")


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/image.jpg"
# Multiple ways to load an image:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=ImageDescription,
    messages=[
        {
            "role": "user",
            "content": [
                "What is in this image?",
                # Option 1: Direct URL with autodetection
                Image.from_url(url),
                # Option 2: Local file
                # Image.from_path("path/to/local/image.jpg")
                # Option 3: Base64 string
                # Image.from_base64("base64_encoded_string_here")
                # Option 4: Autodetect
                # Image.autodetect(<url|path|base64>)
            ],
        },
    ],
)

print(response)
# Example output:
# ImageDescription(
#     objects=['blueberries', 'leaves'],
#     scene='A blueberry bush with clusters of ripe blueberries and some unripe ones against a cloudy sky',
#     colors=['green', 'blue', 'purple', 'white']
# )

Audio Processing

Instructor makes it easy to analyse and extract semantic information from Audio files using the Gemini series of models. Let's see an example below with the sample Audio file above where we'll load it in using our from_url method.

Note that we support local files and base64 strings too with the from_path

from instructor.multimodal import Audio
from pydantic import BaseModel
import instructor
from google.genai import Client


class AudioDescription(BaseModel):
    transcript: str
    summary: str
    speakers: list[str]
    key_points: list[str]


url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/gettysburg.wav"

client = instructor.from_genai(Client())

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=AudioDescription,
    messages=[
        {
            "role": "user",
            "content": [
                "Please transcribe and analyze this audio:",
                # Multiple loading options:
                Audio.from_url(url),
                # Option 2: Local file
                # Audio.from_path("path/to/local/audio.mp3")
            ],
        },
    ],
)

print(response)
# > transcript='Four score and seven years ago our fathers..."]

PDF

Instructor makes it easy to analyse and extract semantic information from PDFs using Gemini's new models.

Let's see an example below with the sample PDF above where we'll load it in using our from_url method. With this integration that we're passing in the raw bytes to gemini itself, we also support using the Files api with the PDFWithGenaiFile class.

Note that we support local files and base64 strings using this method too with the from_path and the from_base64 class methods.

from instructor.multimodal import PDF
from pydantic import BaseModel
import instructor
from google.genai import Client


class Receipt(BaseModel):
    total: int
    items: list[str]


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=Receipt,
    messages=[
        {
            "role": "user",
            "content": [
                "Extract out the total and line items from the invoice",
                # Option 1: Direct URL
                PDF.from_url(url),
                # Option 2: Local file
                # PDF.from_path("path/to/local/invoice.pdf"),
                # Option 3: Base64 string
                # PDF.from_base64("base64_encoded_string_here")
                # Option 4: Autodetect
                # PDF.autodetect(<url|path|base64>)
            ],
        },
    ],
)

print(response)
# > Receipt(total=220, items=['English Tea', 'Tofu'])

We also support the use of PDFs with the Gemini Files api with the PDFWithGenaiFile that allows you to use existing uploaded files or local files.

Note that the PdfWithGenaiFile.from_new_genai_file operation is blocking and you can set the timeout and retry delay that we'll call while we await the upload to be registered as completed.

PDFWithGenaiFile.from_new_genai_file(
    "./invoice.pdf",
    retry_delay=1,  # Time to wait before checking if file is ready to use
    max_retries=20 # Number of times to check before throwing an error
),

This makes it easier for you to work with the Gemini files API. You can use this in a normal chat completion as seen below

from instructor.multimodal import PDFWithGenaiFile
from pydantic import BaseModel
import instructor
from google.genai import Client


class Receipt(BaseModel):
    total: int
    items: list[str]


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=Receipt,
    messages=[
        {
            "role": "user",
            "content": [
                "Extract out the total and line items from the invoice",
                # Option 1: Direct URL
                PDFWithGenaiFile.from_new_genai_file("./invoice.pdf"),

                # Option 2 : Existing Genai File
                # PDFWithGenaiFile.from_existing_genai_file("invoice.pdf"),
            ],
        },
    ],
)

print(response)

If you'd like more fine-grained control over the files used, you can also use the Files api directly as seen below.

Using Files

Our API integration also supports the use of files

from google import genai
import instructor
from pydantic import BaseModel


class Summary(BaseModel):
    summary: str


client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

file1 = client.files.upload(
    file="./gettysburg.wav",
)

# As a parameter
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    system="Summarise the audio file.",
    messages=[
        file1,
    ],
    response_model=Summary,
)

print(response)
# > summary="Abraham Lincoln's Gettysburg Address commences by stating that 87 years prior, the founding fathers created a new nation based on liberty and equality. It goes on to say that the Civil War is testing whether a nation so conceived can survive."

Streaming Responses

Note: Streaming functionality is currently only available when using the Mode.GENAI_STRUCTURED_OUTPUTS mode with Gemini models. Other modes like tools do not support streaming at this time.

Streaming allows you to process responses incrementally rather than waiting for the complete result. This is extremely useful for making UI changes feel instant and responsive.

Partial Streaming

Receive a stream of complete, validated objects as they're generated:

from pydantic import BaseModel
import instructor
from google import genai


client = instructor.from_genai(
    genai.Client(), mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)


class Person(BaseModel):
    name: str
    age: int


class PersonList(BaseModel):
    people: list[Person]


stream = client.chat.completions.create_partial(
    model="gemini-2.0-flash-001",
    system="You are a helpful assistant. You must return a function call with the schema provided.",
    messages=[
        {
            "role": "user",
            "content": "Ivan is 20 years old, Jason is 25 years old, and John is 30 years old",
        }
    ],
    response_model=PersonList,
)

for extraction in stream:
    print(extraction)
    # > people=[PartialPerson(name='Ivan', age=None)]
    # > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=None)]
    # > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=30)]

Async Support

Instructor provides full async support for the genai SDK, allowing you to make non-blocking requests in async applications:

import asyncio

import instructor
from google import genai
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


async def extract_user():
    client = genai.Client()
    client = instructor.from_genai(
        client, mode=instructor.Mode.GENAI_TOOLS, use_async=True
    )

    response = await client.chat.completions.create(
        model="gemini-2.0-flash-001",
        messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
        response_model=User,
    )
    return response


print(asyncio.run(extract_user()))
#> name = Jason age= 25