Thinking Mode for Complex Reasoning
Thinking mode enables AI models to show their reasoning process before providing final answers. This feature is particularly useful for complex problems that require deep analysis, mathematical reasoning, or multi-step problem solving.
Overviewβ
When thinking mode is enabled, models perform internal reasoning before generating their final response. This reasoning process is captured and can be displayed to help users understand how the model arrived at its answer.
Benefitsβ
- Better Answers: Models take more time to think through complex problems
- Transparency: See the reasoning steps the model used
- Debugging: Understand model behavior and decision-making process
- Learning: Learn problem-solving approaches from the model's reasoning
Supported Modelsβ
Gemini Modelsβ
gemini-2.0-flash-thinking: Optimized for deep reasoning with thinking mode
Claude Modelsβ
claude-3-opus: Highest capability with extended thinking supportclaude-3-sonnet: Balanced performance with extended thinking support
Configuration Methodsβ
Thinking mode can be configured in three ways, with the following precedence:
- CLI Flag (highest priority):
--reasoning <level> - Agent Configuration:
reasoning_effortin agent TOML file - Default:
medium(if not specified)
Reasoning Effort Levelsβ
- Low: Minimal reasoning effort for simple tasks
- Medium: Moderate reasoning for balanced performance (default)
- High: Maximum reasoning effort for complex problems
Usage Examplesβ
CLI Configurationβ
# Use high reasoning effort for complex problem
rad step my-agent "Solve this complex math problem" --reasoning high
# Use low reasoning for simple tasks
rad step my-agent "What is 2+2?" --reasoning low
Agent Configurationβ
[agent]
id = "math-agent"
name = "Math Problem Solver"
reasoning_effort = "high" # Always use high reasoning for this agent
Viewing Thinking Processβ
To see the model's thinking process, use the --show-metadata flag:
rad step my-agent "Complex problem" --reasoning high --show-metadata
The thinking process will appear in the metadata section, displayed before other metadata like token usage.
Cost Implicationsβ
Thinking models typically cost 2-3x more than standard models due to increased token usage:
- Gemini Flash Thinking: $0.20/$0.80 per 1M tokens (vs $0.075/$0.30 for Flash Exp)
- Claude Opus: $15.00/$75.00 per 1M tokens (supports extended thinking)
- Claude Sonnet: $3.00/$15.00 per 1M tokens (supports extended thinking)
Higher reasoning effort levels result in more thinking tokens, increasing overall cost.
When to Use Thinking Modeβ
Use thinking mode when:
- Solving complex mathematical problems
- Performing multi-step reasoning
- Analyzing complex code or systems
- Making important decisions that require careful consideration
- Debugging or understanding model behavior
Don't use thinking mode when:
- Simple questions or straightforward tasks
- High-volume processing where cost matters
- Real-time applications requiring fast responses
- Tasks that don't benefit from deep reasoning
Troubleshootingβ
Thinking process not appearingβ
- Check model support: Ensure you're using a thinking model (e.g.,
gemini-2.0-flash-thinking) - Verify reasoning effort: Make sure reasoning effort is set (CLI flag or agent config)
- Check metadata flag: Use
--show-metadatato display thinking process - Non-thinking models: Regular models won't show thinking process even with reasoning effort set
High costsβ
- Reduce reasoning effort: Use
lowormediuminstead ofhigh - Use standard models: Switch to non-thinking models for simple tasks
- Monitor token usage: Check metadata to see actual token consumption
Slow responsesβ
- Expected behavior: Thinking mode takes longer as models perform internal reasoning
- Reduce reasoning effort: Lower reasoning effort levels are faster
- Use streaming: Consider using
--streamflag for real-time output
Technical Detailsβ
How It Worksβ
- User specifies reasoning effort (CLI, config, or default)
- Reasoning effort is resolved through precedence chain
- Value is passed to
ModelParameters.reasoning_effort - Provider-specific APIs map reasoning effort to thinking configuration:
- Gemini: Maps to
thinkingConfig.thinking_budget(0.3/0.6/1.0) - Claude: Maps to
thinking.thinking_budget(0.3/0.6/1.0)
- Gemini: Maps to
- Model performs thinking process
- Thinking process is extracted from response and stored in metadata
- CLI displays thinking process when
--show-metadatais used
Provider-Specific Implementationβ
Gemini: Uses thinkingConfig field in generation config. Thinking process is returned in response thinking field.
Claude: Uses thinking field in request. Thinking process is returned in response thinking field.