Skip to content

Subscribe to our Newsletter for Updates and Tips

If you want to get updates on new features and tips on how to use Instructor, you can subscribe to our newsletter below to get notified when we publish new content.

Advanced Topics

  1. Instructor Implements llms.txt
  2. Query Understanding: Beyond Embeddings
  3. Achieving GPT-4 Level Summaries with GPT-3.5-turbo
  4. Basics of Guardrails and Validation in AI Models
  5. Validating Citations in AI-Generated Content
  6. Fine-tuning and Distillation in AI Models
  7. Enhancing OpenAI Client Observability with LangSmith
  8. Logfire Integration with Pydantic

AI Development and Optimization

Language Models and Prompting Techniques

Integrations and Tools

Media and Resources

Instructor Adopting Cursor Rules

AI-assisted coding is changing how we use version control. Many developers now use what I call "vibe coding" — coding with AI help. This creates new challenges with Git. Today I'll share how we're using Cursor rules in Instructor to solve these problems.

Migrating to uv

Why we migrated to uv

We recently migrated to uv from poetry because we wanted to benefit from it's many features such as

  • Easier dependency management with automatic caching built in
  • Significantly faster CI/CD compared to poetry, especially when we use the caching functionality provided by the Astral team
  • Cargo-style lockfile that makes it easier to adopt new PEP features as they come out

We took around 1-2 days to handle the migration and we're happy with the results. On average, for CI/CD, we've seen a huge speed up for our jobs.

Here are some timings for jobs that I took from our CI/CD runs.

In general I'd say that we saw a ~3x speedup with approximately 67% reduction in time needed for the jobs once we implemented caching for the individual uv github actions.

Extracting Metadata from Images using Structured Extraction

Multimodal Language Models like gpt-4o excel at processing multimodal, enabling us to extract rich, structured metadata from images.

This is particularly valuable in areas like fashion where we can use these capabilities to understand user style preferences from images and even videos. In this post, we'll see how to use instructor to map images to a given product taxonomy so we can recommend similar products for users.

Consistent Stories with GPT-4o

Language Models struggle to generate consistent graphs that have a large number of nodes. Often times, this is because the graph itself is too large for the model to handle. This causes the model to generate inconsistent graphs that have invalid and disconnected nodes among other issues.

In this article, we'll look at how we can get around this limitation by using a two-phase approach to generate complex DAGs with gpt-4o by looking at a simple example of generating a Choose Your Own Adventure story.

Consistent Stories with GPT-4o

Language Models struggle to generate consistent graphs that have a large number of nodes. Often times, this is because the graph itself is too large for the model to handle. This causes the model to generate inconsistent graphs that have invalid and disconnected nodes among other issues.

In this article, we'll look at how we can get around this limitation by using a two-phase approach to generate complex DAGs with gpt-4o by looking at a simple example of generating a Choose Your Own Adventure story.

Using Structured Outputs to convert messy tables into tidy data

Why is this a problem?

Messy data exports are a common problem. Whether it's multiple headers in the table, implicit relationships that make analysis a pain or even just merged cells, using instructor with structured outputs makes it easy to convert messy tables into tidy data, even if all you have is just an image of the table as we'll see below.

Let's look at the following table as an example. It makes analysis unnecessarily difficult because it hides data relationships through empty cells and implicit repetition. If we were using it for data analysis, cleaning it manually would be a huge nightmare.

Structured Outputs with Writer now supported

We're excited to announce that instructor now supports Writer's enterprise-grade LLMs, including their latest Palmyra X 004 model. This integration enables structured outputs and enterprise AI workflows with Writer's powerful language models.

Getting Started

First, make sure that you've signed up for an account on Writer and obtained an API key using this quickstart guide. Once you've done so, install instructor with Writer support by running pip install instructor[writer] in your terminal.

Make sure to set the WRITER_API_KEY environment variable with your Writer API key or pass it as an argument to the Writer constructor.