Reasoning Configuration Guide
This guide explains how to configure reasoning effort for AI models in Radium.
Configuration Methodsβ
1. CLI Flag (Highest Priority)β
Override reasoning effort for a single execution:
rad step my-agent "Prompt" --reasoning high
Options:
--reasoning low: Minimal reasoning--reasoning medium: Standard reasoning (default)--reasoning high: Maximum reasoning
2. Agent Configurationβ
Set default reasoning effort in agent TOML file:
[agent]
id = "my-agent"
name = "My Agent"
reasoning_effort = "high" # low, medium, or high
This becomes the default for all executions unless overridden by CLI flag.
3. Persona Configurationβ
Configure reasoning through performance profiles:
[agent.persona]
performance.profile = "thinking" # speed, balanced, thinking, or expert
Performance profiles map to reasoning capabilities:
- speed: Fast models, lower reasoning
- balanced: Balanced speed and quality (default)
- thinking: Optimized for deep reasoning
- expert: Expert-level reasoning, highest cost
Precedence Chainβ
Reasoning effort is resolved in this order:
- CLI flag (
--reasoning) - Highest priority - Agent config (
reasoning_effortin TOML) - Default (
medium) - Lowest priority
Examplesβ
Example 1: Agent with High Reasoningβ
[agent]
id = "math-solver"
name = "Math Problem Solver"
reasoning_effort = "high"
engine = "gemini"
model = "gemini-2.0-flash-thinking"
Usage:
rad step math-solver "Solve: x^2 + 5x + 6 = 0"
# Uses high reasoning from config
Example 2: CLI Overrideβ
[agent]
id = "general-agent"
reasoning_effort = "low" # Default to low
Usage:
rad step general-agent "Complex problem" --reasoning high
# CLI flag overrides config, uses high reasoning
Example 3: Default Behaviorβ
[agent]
id = "simple-agent"
# No reasoning_effort specified
Usage:
rad step simple-agent "Simple question"
# Uses default (medium) reasoning
Provider-Specific Configurationβ
Geminiβ
Gemini thinking models (e.g., gemini-2.0-flash-thinking) support thinking mode:
[agent]
id = "gemini-thinking-agent"
engine = "gemini"
model = "gemini-2.0-flash-thinking"
reasoning_effort = "high"
The reasoning effort maps to Gemini's thinkingConfig.thinking_budget:
- Low: 0.3 (minimal thinking)
- Medium: 0.6 (standard thinking)
- High: 1.0 (maximum thinking)
Claudeβ
Claude models support extended thinking:
[agent]
id = "claude-thinking-agent"
engine = "claude"
model = "claude-3-opus"
reasoning_effort = "high"
The reasoning effort maps to Claude's thinking.thinking_budget:
- Low: 0.3 (minimal extended thinking)
- Medium: 0.6 (standard extended thinking)
- High: 1.0 (maximum extended thinking)
Cost Considerationsβ
Reasoning effort directly impacts cost:
| Model | Standard | Thinking (High) | Multiplier |
|---|---|---|---|
| Gemini Flash Exp | $0.075/$0.30 | - | - |
| Gemini Flash Thinking | - | $0.20/$0.80 | ~2.7x |
| Claude Sonnet | $3.00/$15.00 | $3.00/$15.00* | ~1.0x* |
| Claude Opus | $15.00/$75.00 | $15.00/$75.00* | ~1.0x* |
*Claude models use extended thinking which increases token usage, effectively increasing cost per request.
Best Practicesβ
-
Use appropriate reasoning levels:
- Simple tasks:
lowormedium - Complex problems:
high
- Simple tasks:
-
Monitor costs:
- Check token usage with
--show-metadata - Use thinking models only when needed
- Check token usage with
-
Combine with model selection:
- Use thinking models for complex tasks
- Use standard models for simple tasks
-
Test reasoning levels:
- Start with
mediumand adjust based on results - Use
highonly when necessary
- Start with
Troubleshootingβ
Reasoning effort not taking effectβ
- Verify model supports thinking mode (check model name)
- Check precedence: CLI flag overrides config
- Ensure reasoning effort is spelled correctly (
low,medium,high)
Unexpected costsβ
- Reduce reasoning effort level
- Use standard models for simple tasks
- Monitor token usage in metadata
Slow performanceβ
- Lower reasoning effort for faster responses
- Use streaming mode for real-time output
- Consider using faster models for time-sensitive tasks