Consistent Stories with GPT-4o¶

Language Models struggle to generate consistent graphs that have a large number of nodes. Often times, this is because the graph itself is too large for the model to handle. This causes the model to generate inconsistent graphs that have invalid and disconnected nodes among other issues.

In this article, we'll look at how we can get around this limitation by using a two-phase approach to generate complex DAGs with gpt-4o by looking at a simple example of generating a Choose Your Own Adventure story.

Why do DAGs matter?¶

DAGs are directed acyclic graphs. A graph is considered a DAG when every connection between nodes is directed ( it goes in a single direction ) and there are no cycles ( it doesn't loop back to a previous node ).

graph TD
    A --> B
    A --> C
    B --> D
    C --> D

This isn't too far away from a Choose Your Own Adventure story where users have a fixed set of choices at each step and can only move forward in the story. We can see this in action below:

graph TD
    A[Story Root] --> B[Choice 1]
    A --> C[Choice 2]
    A --> D[Choice 3]
    B --> E[Choice 1.1]
    B --> F[Choice 1.2]
    C --> G[Choice 2.1]
    C --> H[Choice 2.2]
    D --> I[Choice 3.1]
    D --> J[Choice 3.2]

The Challenge: Scaling Story Generation¶

When we try to use a language model to generate a story in a single run, this hits several limitations quickly because just with 4 choices at each step, we're already at 20 nodes by the second level. If users can only make 2 choices before our story ends, that doesn't result in a very interesting story to play with.

In other words, we'll overflow the context window of the model quickly. To get around this, we can use a two-phase approach to generate the story where we generate an initial story setting and then generate the choices/other options in parallel.

Parallel Story Generation¶

Generating an Outline¶

First, we generate an outline of the story using gpt-4o. This is important because it gives us a starting setting, visual style and image description ( for the banner image ). We can then use this down the line to ensure the images we generate are consistent as much as possible.

from pydantic import BaseModel
from typing import List

class GeneratedStory(BaseModel):
    setting: str
    plot_summary: str
    choices: List[str]
    visual_style: str
    image_description: str

async def generate_story(
    client: instructor.AsyncInstructor,
    story_input: RestateStoryInput
):
    resp = await client.chat.completions.create(
        messages=[{
            "role": "user",
            "content": """
            Generate a story with:
            - Setting: {{ story_input.setting}}
            - Title: {{ story_input.title }}

            Rules:
            - Generate 2-4 initial choices that represent actions
            - Choices must move story forward
            - Include brief setting description
            - Generate a visual description for the story

            Required Elements:
            1. Plot Summary: A vivid description of the setting and plot
            2. Initial Choices: 2-4 distinct actions the user can take
            3. Visual Style: Description of art style, color palette
            4. Image Description: One-sentence scene description
            """
        }],
        model="gpt-4o",
        response_model=GeneratedStory,
        context={"story_input": story_input},
    )
    return resp

This outputs a story with a setting, plot summary, choices, visual style and image description.

# Example generated output
{
    "setting": "A neon-lit cyberpunk metropolis in 2150",
    "plot_summary": "In the sprawling city of Neo-Tokyo...",
    "choices": [
        "Investigate the mysterious signal in the abandoned district",
        "Meet your contact at the underground hacker hub",
        "Follow the corporate executive who seems suspicious"
    ],
    "visual_style": "Vibrant neon colors, detailed cyberpunk architecture",
    "image_description": "A towering cyberpunk cityscape at night with neon signs"
}

Parallel Choice Expansion¶

One of the biggest challenges in generating deep story trees is maintaining consistency as the story branches grow.

Here's how we solve this with parallel generation and state tracking:

graph TD
    %% Main nodes
    A[Find Door] --> B[Open Door]
    A --> C[Walk Away]

    B --> D[Read Book]
    B --> E[Leave Room]

    C --> F[Go Home]
    C --> G[Wait Outside]

    %% Styling for visual hierarchy
    classDef start fill:#ff9999,stroke:#333,stroke-width:2px
    classDef decision fill:#99ccff,stroke:#333,stroke-width:2px
    classDef outcome fill:#99ffff,stroke:#333,stroke-width:1px

    %% Apply styles
    class A start
    class B,C decision
    class D,E,F,G outcome

    %% Add tooltips for context
    click B "Door context" "Open Door Context"
    click C "Away context" "Walk Away Context"
    click D "Door and Book context" "Read Book Context"

The key insight is that each path through the story tree has its own unique state. We do so by having a simple accumulator that allows us to keep track of the previous choices and the story context.

It's also important to note here that the model also has the full flexibility to end the story at any point in time.

Here's how we implement this:

async def rewrite_choice(
    client: instructor.AsyncInstructor,
    choice: str,
    story: GeneratedStory,
    prev_choices: list[dict],  # Accumulator for path state
    max_depth: int,
    sem: asyncio.Semaphore
) -> FinalStoryChoice:
    # Each choice knows its entire path history
    async with sem:
        rewritten_choice = await client.chat.completions.create(
            model="gpt-4o",
            response_model=RewrittenChoice,
            messages=[{
                "role": "user",
                "content": """
                Given this choice: {{ choice }}

                Story context:
                Setting: {{ story.setting }}
                Plot: {{ story.plot_summary }}

                Previous choices made in this path:
                {% for prev in prev_choices %}
                - {{ prev.choice_description }}
                  Result: {{ prev.choice_consequences }}
                {% endfor %}

                Generate the next story beat and 2-4 new choices.
                The story should end in {{ max_depth - len(prev_choices) }} more turns.
                """
            }],
            context={
                "choice": choice,
                "story": story,
                "prev_choices": prev_choices,
            }
        )

    # For terminal nodes (at max depth)
    if len(prev_choices) == max_depth - 1:
        return FinalStoryChoice(
            choice_description=rewritten_choice.choice_description,
            choice_consequences=rewritten_choice.choice_consequences,
            choices=[]  # Terminal node
        )

    # Recursively expand child choices
    child_choices = await asyncio.gather(*[
        rewrite_choice(
            client=client,
            choice=new_choice,
            story=story,
            prev_choices=prev_choices + [{
                "choice_description": rewritten_choice.choice_description,
                "choice_consequences": rewritten_choice.choice_consequences
            }],
            max_depth=max_depth,
            sem=sem
        )
        for new_choice in rewritten_choice.choices
    ])

    return FinalStoryChoice(
        choice_description=rewritten_choice.choice_description,
        choice_consequences=rewritten_choice.choice_consequences,
        choices=child_choices
    )

This approach gives us several key benefits:

Path-Specific Context: Each node maintains the complete history of choices that led to it, ensuring consistency within each branch
Parallel Generation: Different branches can be generated simultaneously since they each maintain their own state
Controlled Growth: The max_depth parameter prevents exponential expansion
Rate Limiting: The semaphore controls concurrent API calls while allowing maximum parallelization

The semaphore isn't just for rate limiting - it ensures we process choices at a manageable pace while maintaining state consistency.

Each path through the story tree becomes a self-contained narrative with access to its complete history, allowing us to generate coherent stories at a much faster speed and verbosity than a single call would be able to generate.

Additionally, we can generate stories that are much broader and deeper than a single call would be able to generate.

Beyond Story Generation¶

The success of this approach comes down to three key principles:

State Isolation: Each node maintains only the context it needs, preventing context window overflow
Parallel Processing: Generation can happen simultaneously across branches, dramatically reducing total generation time
Structured Validation: Using Pydantic models ensures each generated component meets your requirements

For example, generating a 20-node story tree sequentially might take 60 seconds (3s per node), but with parallel generation and 10 concurrent requests, it could complete in just 45-50 seconds.

This pattern is particularly valuable when:

Your generation tasks naturally form a tree or graph structure
Individual nodes need some but not all context from their ancestors
You need to generate content that exceeds a single context window
Speed of generation is important

By combining structured outputs with parallel generation, you can reliably generate complex, interconnected content at scale while maintaining consistency and control.

instructor makes it easy to generate complex Data Structures with language models - whether they're open source models with ollama or proprietary models with providers such as OpenAI. Give us a try today!