Logfire
Introduction¶
Logfire is a new observability platform coming from the creators of Pydantic. It integrates almost seamlessly with many of your favourite libraries such as Pydantic, HTTPx and Instructor. In this article, we'll show you how to use Logfire with Instructor to gain visibility into the performance of your entire application.
We'll walk through the following examples
- Classifying scam emails using Instructor
- Performing simple validation using the
llm_validator
- Extracting data into a markdown table from an infographic with GPT4V
As usual, all of the code that we refer to here is provided in examples/logfire for you to use in your projects.
classify.py
: Email Classification Exampleimage.py
: GPT4-V Examplevalidate.py
:llm_validator
example
Configure Logfire
Before starting this tutorial, make sure that you've registered for a Logfire account. You'll also need to create a project to track these logs.
We'll need to install our dependencies and configure logfire auth before proceeding so simply run the commands below. Logfire will handle the authentication and configuration of your project.
Classification¶
Now that we've got Logfire setup, let's see how we can get it to help us track a simple classification job.
Logfire is dead simple to integrate - all it takes is 2 lines of code and we have it setup.
from openai import OpenAI
import instructor
import logfire
openai_client = OpenAI()
logfire.configure(pydantic_plugin=logfire.PydanticPlugin(record="all")) # (1)!
logfire.instrument_openai(openai_client) # (2)!
client = instructor.from_openai(openai_client)
- We add Pydantic logging using
logfire
. Note that depending on your use-case, you can configure what you want to log with Pydantic - We use their openai_integration to configure logging for our client before using instructor on it
In this example, we'll be looking at classifying emails as either spam or not spam. To do so, we can define a simple Pydantic model as seen below.
import enum
class Labels(str, enum.Enum):
"""Enumeration for single-label text classification."""
SPAM = "spam"
NOT_SPAM = "not_spam"
class SinglePrediction(BaseModel):
"""
Class for a single class label prediction.
"""
class_label: Labels
We can then use this in a generic instructor function as seen below that simply asks the model to classify text and return it in the form of a SinglePrediction
Pydantic object.
Logfire can help us to log this entire function, and what's happening inside it, even down to the model validation level by using their logfire.instrument
decorator.
@logfire.instrument("classification", extract_args=True) # (1)!
def classify(data: str) -> SinglePrediction:
"""Perform single-label classification on the input text."""
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=SinglePrediction,
messages=[
{
"role": "user",
"content": f"Classify the following text: {data}",
},
],
)
- Logfire allows us to use the
logfire.instrument
decorator and tag a function to a specific name.
Let's see what happens when we run this against a list of different emails
emails = [
"Hello there I'm a Nigerian prince and I want to give you money",
"Meeting with Thomas has been set at Friday next week",
"Here are some weekly product updates from our marketing team",
]
for email in emails:
classify(email)
There are a few important things here that the logs immediately give us
- The duration that each individual portion of our code took to run
- The payload that we sent over to OpenAI
- The exact arguments and results that were passed to each individual portion of our code at each step
LLM Validators¶
For our second example, we'll use the inbuilt llm_validator
that instructor provides out of the box to validate that our statements don't contain unsafe content that we might not want to serve to users. Let's start by defining a simple Pydantic Model that can do so and configure our logfire integration.
from typing import Annotated
from pydantic import BaseModel
from pydantic.functional_validators import AfterValidator
from instructor import llm_validator
import logfire
import instructor
from openai import OpenAI
openai_client = OpenAI()
logfire.configure(pydantic_plugin=logfire.PydanticPlugin(record="all"))
logfire.instrument_openai(openai_client)
client = instructor.from_openai(openai_client)
class Statement(BaseModel):
message: Annotated[
str,
AfterValidator(
llm_validator("Don't allow any objectionable content", client=client)
),
]
We can then test out our new validator with a few sample statements to see how our validator is working in practice.
messages = [
"I think we should always treat violence as the best solution",
"There are some great pastries down the road at this bakery I know",
]
for message in messages:
try:
Statement(message=message)
except ValidationError as e:
print(e)
With Logfire, we can capture the entirety of the validation proccess. As seen below, we have access to not only the original input data, but also the schema that was being used, the errors that were thrown and even the exact field that threw the error.
Vision Models¶
For our last example, let's see how we can use Logfire to extract structured data from an image using GPT-4V with OpenAI. We'll be using a simple bar graph here and using GPT4V
to extract the data from the image from statista below and convert it into a markdown format.
What we want is an output of the combined numbers as seen below
Country | Total Skier Visits (M) |
---|---|
United States | 55.5 |
Austria | 43.6 |
France | 40.7 |
Japan | 26.6 |
Italy | 22.3 |
Switzerland | 22 |
Canada | 18.5 |
China | 17.9 |
Sweden | 9.2 |
Germany | 7 |
This is relatively simple with Pydantic. What we need to do is to define a custom type which will handle the conversion process as seen below
from pydantic import BeforeValidator, InstanceOf, WithJsonSchema
def md_to_df(data: Any) -> Any:
# Convert markdown to DataFrame
if isinstance(data, str):
return (
pd.read_csv(
StringIO(data), # Process data
sep="|",
index_col=1,
)
.dropna(axis=1, how="all")
.iloc[1:]
.applymap(lambda x: x.strip())
)
return data
MarkdownDataFrame = Annotated[
InstanceOf[pd.DataFrame], # (1)!
BeforeValidator(md_to_df), # (2)!
WithJsonSchema( # (3)!
{
"type": "string",
"description": "The markdown representation of the table, each one should be tidy, do not try to join tables that should be seperate",
}
),
]
- We indicate that the type of this type should be a pandas dataframe
- We run a validation step to ensure that we can convert the input into a valid pandas dataframe and return a new pandas Dataframe for our model to use
- We then override the type of the schema so that when we pass it to OpenAI, it knows to generate a table in a markdown format.
We can then use this in a normal instructor call
import instructor
import logfire
openai_client = OpenAI()
logfire.configure(pydantic_plugin=logfire.PydanticPlugin(record="all"))
logfire.instrument_openai(openai_client)
client = instructor.from_openai(openai_client, mode=instructor.Mode.MD_JSON)
@logfire.instrument("extract-table", extract_args=True)
def extract_table_from_image(url: str) -> Iterable[Table]:
return client.chat.completions.create(
model="gpt-4-vision-preview",
response_model=Iterable[Table],
max_tokens=1800,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract out a table from the image. Only extract out the total number of skiiers.",
},
{"type": "image_url", "image_url": {"url": url}},
],
}
],
)
We can then call it as seen below
url = "https://cdn.statcdn.com/Infographic/images/normal/16330.jpeg"
tables = extract_table_from_image(url)
for table in tables:
print(table.caption, end="\n")
print(table.dataframe.to_markdown())
Logfire is able to capture the stack track of the entire call as seen below, profile each part of our application and most importantly capture the raw inputs of the OpenAI call alongside any potential errors.