Skip to content

Subscribe to our Newsletter for Updates and Tips

Why should I use prompt caching?

Developers often face two key challenges when working with large context - Slow response times and high costs. This is especially true when we're making multiple of these calls over time, severely impacting the cost and latency of our applications. With Anthropic's new prompt caching feature, we can easily solve both of these issues.

Since the new feature is still in beta, we're going to wait for it to be generally avaliable before we integrate it into instructor. In the meantime, we've put together a quickstart guide on how to use the feature in your own applications.

Structured Outputs for Gemini now supported

We're excited to announce that instructor now supports structured outputs using tool calling for both the Gemini SDK and the VertexAI SDK.

A special shoutout to Sonal for his contributions to the Gemini Tool Calling support.

Let's walk through a simple example of how to use these new features

Installation

To get started, install the latest version of instructor. Depending on whether you're using Gemini or VertexAI, you should install the following:

pip install "instructor[google-generativeai]"
pip install "instructor[vertexai]"

This ensures that you have the necessary dependencies to use the Gemini or VertexAI SDKs with instructor.

We recommend using the Gemini SDK over the VertexAI SDK for two main reasons.

  1. Compared to the VertexAI SDK, the Gemini SDK comes with a free daily quota of 1.5 billion tokens to use for developers.
  2. The Gemini SDK is significantly easier to setup, all you need is a GOOGLE_API_KEY that you can generate in your GCP console. THe VertexAI SDK on the other hand requires a credentials.json file or an OAuth integration to use.

Getting Started

With our provider agnostic API, you can use the same interface to interact with both SDKs, the only thing that changes here is how we initialise the client itself.

Before running the following code, you'll need to make sure that you have your Gemini API Key set in your shell under the alias GOOGLE_API_KEY.

import instructor
import google.generativeai as genai
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",  # (1)!
    )
)

resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

print(resp)
#> name='Jason' age=25
  1. Current Gemini models that support tool calling are gemini-1.5-flash-latest and gemini-1.5-pro-latest.

We can achieve a similar thing with the VertexAI SDK. For this to work, you'll need to authenticate to VertexAI.

There are some instructions here but the easiest way I found was to simply download the GCloud cli and run gcloud auth application-default login.

import instructor
import vertexai  # type: ignore
from vertexai.generative_models import GenerativeModel  # type: ignore
from pydantic import BaseModel

vertexai.init()


class User(BaseModel):
    name: str
    age: int


client = instructor.from_vertexai(
    client=GenerativeModel("gemini-1.5-pro-preview-0409"),  # (1)!
)


resp = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=User,
)

print(resp)
#> name='Jason' age=25
  1. Current Gemini models that support tool calling are gemini-1.5-flash-latest and gemini-1.5-pro-latest.

Should I Be Using Structured Outputs?

OpenAI recently announced Structured Outputs which ensures that generated responses match any arbitrary provided JSON Schema. In their announcement article, they acknowledged that it had been inspired by libraries such as instructor.

Main Challenges

If you're building complex LLM workflows, you've likely considered OpenAI's Structured Outputs as a potential replacement for instructor.

But before you do so, three key challenges remain:

  1. Limited Validation And Retry Logic: Structured Outputs ensure adherence to the schema but not useful content. You might get perfectly formatted yet unhelpful responses
  2. Streaming Challenges: Parsing raw JSON objects from streamed responses with the sdk is error-prone and inefficient
  3. Unpredictable Latency Issues : Structured Outputs suffers from random latency spikes that might result in an almost 20x increase in response time

Additionally, adopting Structured Outputs locks you into OpenAI's ecosystem, limiting your ability to experiment with diverse models or providers that might better suit specific use-cases.

This vendor lock-in increases vulnerability to provider outages, potentially causing application downtime and SLA violations, which can damage user trust and impact your business reputation.

In this article, we'll show how instructor addresses many of these challenges with features such as automatic reasking when validation fails, automatic support for validated streaming data and more.

Parea for Observing, Testing & Fine-tuning of Instructor

Parea is a platform that enables teams to monitor, collaborate, test & label for LLM applications. In this blog we will explore how Parea can be used to enhance the OpenAI client alongside instructor and debug + improve instructor calls. Parea has some features which makes it particularly useful for instructor:

  • it automatically groups any LLM calls due to reties under a single trace
  • it automatically tracks any validation error counts & fields that occur when using instructor
  • it provides a UI to label JSON responses by filling out a form instead of editing JSON objects
Configure Parea

Before starting this tutorial, make sure that you've registered for a Parea account. You'll also need to create an API key.

Example: Writing Emails with URLs from Instructor Docs

We will demonstrate Parea by using instructor to write emails which only contain URLs from the instructor docs. We'll need to install our dependencies before proceeding so simply run the command below.

Analyzing Youtube Transcripts with Instructor

Extracting Chapter Information

Code Snippets

As always, the code is readily available in our examples/youtube folder in our repo for your reference in the run.py file.

In this post, we'll show you how to summarise Youtube video transcripts into distinct chapters using instructor before exploring some ways you can adapt the code to different applications.

By the end of this article, you'll be able to build an application as per the video below.

Why Instructor is the best way to get JSON from LLMs

Large Language Models (LLMs) like GPT are incredibly powerful, but getting them to return well-formatted JSON can be challenging. This is where the Instructor library shines. Instructor allows you to easily map LLM outputs to JSON data using Python type annotations and Pydantic models.

Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including Mistral/Mixtral, Ollama, and llama-cpp-python.

It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage validation context, retries with Tenacity, and streaming Lists and Partial responses.

Enhancing RAG with Time Filters Using Instructor

Retrieval-augmented generation (RAG) systems often need to handle queries with time-based constraints, like "What new features were released last quarter?" or "Show me support tickets from the past week." Effective time filtering is crucial for providing accurate, relevant responses.

Instructor is a Python library that simplifies integrating large language models (LLMs) with data sources and APIs. It allows defining structured output models using Pydantic, which can be used as prompts or to parse LLM outputs.

Why Logfire is a perfect fit for FastAPI + Instructor

Logfire is a new tool that provides key insight into your application with Open Telemtry. Instead of using ad-hoc print statements, Logfire helps to profile every part of your application and is integrated directly into Pydantic and FastAPI, two popular libraries amongst Instructor users.

In short, this is the secret sauce to help you get your application to the finish line and beyond. We'll show you how to easily integrate Logfire into FastAPI, one of the most popular choices amongst users of Instructor using two examples

  1. Data Extraction from a single User Query
  2. Using asyncio to process multiple users in parallel
  3. Streaming multiple objects using an Iterable so that they're avaliable on demand

Logfire

Introduction

Logfire is a new observability platform coming from the creators of Pydantic. It integrates almost seamlessly with many of your favourite libraries such as Pydantic, HTTPx and Instructor. In this article, we'll show you how to use Logfire with Instructor to gain visibility into the performance of your entire application.

We'll walk through the following examples

  1. Classifying scam emails using Instructor
  2. Performing simple validation using the llm_validator
  3. Extracting data into a markdown table from an infographic with GPT4V