Leverage Task Specific Systems
Faithful Chain of Thought1 improves the faithfulness of reasoning chains generated by Language Models by breaking it up into two stages
- Translation : We first translate a user query into a series of reasoning steps. These are a task specific set of steps that we can execute deterministically.
- Problem Solving: We execute our steps and arrive at a final answer that we can derive. This ensures that our Chain Of Thought is able to derive a answer that is consistent with the reasoning steps.
They list a few examples in the paper of what these task-specific steps could be
- Math Word Problems : Python Code that can be executed by an interpreter to derive a final answer
- Multi-Hop QA : This is a multi-step reasoning process. To solve this, they use a mix of python and Datalog ( which is a relation and log programming language ) to arrive at a final answer
- Planning : When trying to generate a plan to solve a user query, they generate a list of symbolic goals in a Programming Language and then call a PDDL Planner to obtain a plan to solve the user's query

In the example below, we show how you can use a LLM to generate python code that can be executed by an Interpreter to arrive at a final answer.
We can implement it in instructor as seen below
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
client = instructor.from_openai(OpenAI())
class ReasoningStep(BaseModel):
    id: int = Field(description="Unique ID")
    rationale: list[str] = Field(
        description="""Specific sections from prior reasoning
        steps or the context that ground this reasoning step"""
    )
    dependencies: list[int] = Field(
        description="""IDs of prior reasoning steps that this
        reasoning step depends on"""
    )
    eval_string: str = Field(
        description="""Python Code to execute to generate the
        final evaluation"""
    )
def generate_reasoning_steps(query: str) -> list[ReasoningStep]:
    return client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": """
                You are a world class AI who excels at
                generating reasoning steps to answer a
                question. You will be given a question
                and you will generate a list of reasoning
                steps that are needed to answer the
                question.
                At each point you should either
                - declare a variable to be referenced
                later on
                - combine multiple variables together to
                generate a new result that you should
                store in another variable
                The final answer should be stored in a
                variable called `answer`.
                """,
            },
            {"role": "user", "content": query},
        ],
        model="gpt-4o",
        response_model=list[ReasoningStep],
    )
if __name__ == "__main__":
    steps = generate_reasoning_steps(
        """If there are 3 cars in the parking lot and 2 more
        cars arrive, how many cars are in the parking lot
        after another 2 more arrive?"""
    )
    code = "\n".join([step.eval_string for step in steps])
    print(code)
    """
    initial_cars = 3
    arriving_cars = 2
    cars_after_first_arrival = initial_cars + arriving_cars
    final_car_count = cars_after_first_arrival + 2
    answer = final_car_count
    """
    exec(code)
    local_vars = {}
    exec(code, {}, local_vars)
    print(local_vars.get("answer"))
    #> 7