Optimizing Session Costs

This guide provides strategies for reducing costs and improving efficiency in Radium agent sessions.

Understanding Session Reports

Session reports provide detailed metrics to help you understand where costs are incurred:

Key Metrics

Total Cost: Sum of all model API costs for the session
Input Tokens: Tokens sent to models (typically more expensive)
Output Tokens: Tokens generated by models
Cached Tokens: Tokens served from cache (free or reduced cost)
Tool Calls: Number of tool executions (affects total time)

Reading a Session Report

rad stats session

Look for:

Model Usage: Which models are being used and their token counts
Cache Hit Rate: Percentage of tokens served from cache
Tool Success Rate: Percentage of successful tool calls
Performance Breakdown: Where time is spent (API vs tools)

Cost Optimization Strategies

1. Leverage Caching

Cache effectiveness is shown in session reports:

Cache Hit Rate: Higher is better (aim for >50%)
Cached Tokens: These tokens are free or significantly cheaper

Tips:

Use context files that can be cached
Reuse prompts and templates when possible
Enable caching in your agent configuration

Context Caching

Context caching reduces token costs by 50%+ for repeated context by caching processed tokens at the provider level. This is different from model instance caching - it caches the actual prompt tokens, not the model objects.

Enable Context Caching:

use radium_models::{ModelConfig, ModelType};
use std::time::Duration;

let config = ModelConfig::new(ModelType::Claude, "claude-3-sonnet".to_string())
    .with_context_caching(true)
    .with_cache_ttl(Duration::from_secs(300));

Provider-Specific Notes:

Claude: Use cache breakpoints to mark stable context boundaries
OpenAI: Automatic for GPT-4+ models (just enable caching)
Gemini: Use cache identifiers to reuse cached content

Monitor Cache Performance:

Check cache_usage in ModelResponse to see cache hit rates and cost savings. Aim for >50% cache hit rate for optimal cost reduction.

See Context Caching Documentation for detailed information.

2. Model Selection

Different models have different costs:

Flash models: Lower cost, faster, good for simple tasks
Pro models: Higher cost, more capable, use for complex reasoning

Strategy:

Use flash models for routine operations
Reserve pro models for complex reasoning tasks
Check rad stats model to see which models you're using most

Example:

Model                          Requests  Input Tokens  Output Tokens  Cost
─────────────────────────────────────────────────────────────────────────
gemini-3-pro-preview                168     31,056,954         44,268  $0.1250
gemini-2.5-flash-lite                28         60,389          2,422  $0.0025

In this example, switching more requests to flash-lite could significantly reduce costs.

3. Reduce Tool Calls

Tool calls consume time and can increase costs indirectly:

Success Rate: Higher success rate means fewer retries
Tool Calls: Fewer calls = faster sessions = lower costs

Strategies:

Improve tool reliability to reduce failures
Batch operations when possible
Use more efficient tools that require fewer calls

4. Optimize Prompts

Shorter, more focused prompts reduce input tokens:

Be specific in your instructions
Remove unnecessary context
Use structured formats (JSON, YAML) when possible

5. Monitor and Compare

Use session comparison to identify improvements:

rad stats compare <old-session> <new-session>

Look for:

Token Delta: Reduction in total tokens
Cost Delta: Reduction in total cost
Success Rate: Improvement in tool success rate

Example Analysis:

Token Usage
───────────
  Delta: -5200 (-10.0%)  ← Good: 10% reduction

Cost
────
  Delta: -0.0150 (-12.0%)  ← Good: 12% cost reduction

Success Rate: 92.6% → 95.2% (+2.6%)  ← Good: Fewer failures

Identifying Expensive Operations

High Token Usage

Check which models are consuming the most tokens:

rad stats model

Look for:

Models with very high input token counts
Models with low cache hit rates
Models used frequently but inefficiently

Long-Running Sessions

Check performance metrics:

rad stats session

Look for:

High wall time relative to agent active time (indicates waiting/idle time)
High API time percentage (indicates model calls are slow)
High tool time percentage (indicates tools are slow)

Frequent Failures

Check success rates:

rad stats session

Look for:

Low success rate (<90% indicates problems)
High number of failed tool calls
Patterns in failures (specific tools or operations)

Best Practices

Regular Monitoring

Weekly Review: Check rad stats history weekly to spot trends
Compare Sessions: Use rad stats compare after making changes
Export Data: Use rad stats export to analyze trends over time

Cost Budgeting

Set Limits: Monitor costs and set budgets per project
Track Trends: Export data and track cost trends over time
Alert on Spikes: Watch for sudden cost increases

Optimization Workflow

Baseline: Establish baseline with rad stats session
Identify Issues: Look for high costs, low cache rates, frequent failures
Make Changes: Adjust models, prompts, or tools
Compare: Use rad stats compare to verify improvements
Iterate: Continue optimizing based on results

Example Optimization Scenario

Initial State

$ rad stats session
Total Cost: $0.1250
Model Usage: gemini-3-pro-preview (168 requests, 31M input tokens)
Cache Hit Rate: 15%
Success Rate: 92.6%

Issues Identified

Low cache hit rate (15% - should be >50%)
All requests using expensive pro model
Some tool failures (7.4% failure rate)

Changes Made

Enabled caching for context files
Switched routine operations to flash-lite model
Improved tool error handling

Results

$ rad stats compare <old-session> <new-session>
Cost Delta: -0.0350 (-28.0%)  ← 28% cost reduction!
Cache Hit Rate: 15% → 58%     ← Much better caching
Success Rate: 92.6% → 96.1%    ← Fewer failures

Additional Resources

Session Analytics Documentation - Complete feature documentation
Agent Configuration - Configure agents for efficiency

Understanding Session Reports​

Key Metrics​

Reading a Session Report​

Cost Optimization Strategies​

1. Leverage Caching​

Context Caching​

2. Model Selection​

3. Reduce Tool Calls​

4. Optimize Prompts​

5. Monitor and Compare​

Identifying Expensive Operations​

High Token Usage​

Long-Running Sessions​

Frequent Failures​

Best Practices​

Regular Monitoring​

Cost Budgeting​

Optimization Workflow​

Example Optimization Scenario​

Initial State​

Issues Identified​

Changes Made​

Results​

Additional Resources​

Understanding Session Reports

Key Metrics

Reading a Session Report

Cost Optimization Strategies

1. Leverage Caching

Context Caching

2. Model Selection

3. Reduce Tool Calls

4. Optimize Prompts

5. Monitor and Compare

Identifying Expensive Operations

High Token Usage

Long-Running Sessions

Frequent Failures

Best Practices

Regular Monitoring

Cost Budgeting

Optimization Workflow

Example Optimization Scenario

Initial State

Issues Identified

Changes Made

Results

Additional Resources