Skip to content

Multimodal

Structured Outputs with Multimodal Gemini

In this post, we'll explore how to use Google's Gemini model with Instructor to analyze travel videos and extract structured recommendations. This powerful combination allows us to process multimodal inputs (video) and generate structured outputs using Pydantic models. This post was done in collaboration with Kino.ai, a company that uses instructor to do structured extraction from multimodal inputs to improve search for film makers.

Setting Up the Environment

First, let's set up our environment with the necessary libraries:

from pydantic import BaseModel
import instructor
import google.generativeai as genai