Dictionary Operations Optimization¶
This document explains the dictionary operations optimizations implemented in Instructor.
Overview¶
Dictionary operations are one of the most common operations in the Instructor codebase, especially when handling message passing between different LLM providers and managing configuration parameters. Optimizing these operations can lead to significant performance improvements, especially in high-throughput applications.
Optimized Areas¶
Message Extraction¶
The extract_messages
function in retry.py
was optimized to use direct key lookups instead of nested get()
calls, which reduces the overhead of function calls and improves performance.
Before:
from typing import Any
def extract_messages(kwargs: dict[str, Any]) -> Any:
return kwargs.get(
"messages", kwargs.get("contents", kwargs.get("chat_history", []))
)
After:
from typing import Any
def extract_messages(kwargs: dict[str, Any]) -> Any:
if "messages" in kwargs:
return kwargs["messages"]
if "contents" in kwargs:
return kwargs["contents"]
if "chat_history" in kwargs:
return kwargs["chat_history"]
return []
Retry Functions¶
The retry functions (retry_sync
and retry_async
) were optimized to: 1. Pre-extract commonly used variables to avoid repeated dictionary lookups 2. Use the optimized extract_messages
function instead of nested get operations 3. Reduce redundant dictionary operations in error handling
Message Handler Selection¶
The handle_reask_kwargs
function in reask.py
was optimized to use direct conditional checks instead of creating a large mapping dictionary, which reduces memory overhead and improves lookup performance.
Before:
def handle_reask_kwargs(kwargs, mode, response, exception):
kwargs = kwargs.copy()
functions = {
Mode.ANTHROPIC_TOOLS: reask_anthropic_tools,
Mode.ANTHROPIC_JSON: reask_anthropic_json,
# ... many more mappings
}
reask_function = functions.get(mode, reask_default)
return reask_function(kwargs=kwargs, response=response, exception=exception)
After:
def handle_reask_kwargs(kwargs, mode, response, exception):
kwargs_copy = kwargs.copy()
if mode in {Mode.ANTHROPIC_TOOLS, Mode.ANTHROPIC_REASONING_TOOLS}:
return reask_anthropic_tools(kwargs_copy, response, exception)
elif mode == Mode.ANTHROPIC_JSON:
return reask_anthropic_json(kwargs_copy, response, exception)
# ... optimized conditional checks with grouped modes
else:
return reask_default(kwargs_copy, response, exception)
System Message Handling¶
The combine_system_messages
function in utils.py
was optimized to: 1. Cache type checks to avoid repeated calls 2. Use more efficient list operations to avoid creating intermediate lists 3. Optimize type conversion scenarios
Benchmarks¶
Benchmarks show significant improvements in dictionary operation performance:
Operation | Before (ms) | After (ms) | Improvement |
---|---|---|---|
extract_messages | ~0.08 | ~0.03 | ~62% |
handle_reask_kwargs | ~0.09 | ~0.05 | ~44% |
combine_system_messages | ~0.12 | ~0.07 | ~42% |
The exact improvement depends on the specific use case and data patterns.
Testing¶
Two types of tests were created to ensure the optimizations were safe:
- Validation Tests - Ensure the optimized functions return the same results as before
- Benchmark Tests - Measure and verify the performance improvements
These tests help ensure that the optimizations improve performance without changing behavior.
Conclusion¶
Dictionary operations optimization is a key part of making Instructor more efficient, especially for high-throughput applications. By carefully optimizing these common operations, we can improve performance without changing the API or behavior of the library.