Persona System Architecture
This document describes the architecture for the future persona system that will provide intelligent model recommendations, cost optimization, and agent selection capabilities for Radium.
Table of Contentsβ
- Overview
- Enhanced Agent Metadata Schema
- Model Selection Engine
- Cost Estimation and Budget Tracking
- Agent Recommendation Engine
- Backward Compatibility Strategy
- Integration Points
- Implementation Roadmap
Overviewβ
Goalsβ
The persona system extends the current agent configuration system to provide:
- Intelligent Model Selection: Automatically choose the best model based on task requirements and performance profiles
- Cost Optimization: Track and enforce cost budgets while maintaining quality
- Fallback Chains: Gracefully handle model unavailability with automatic fallbacks
- Agent Recommendations: Suggest the best agent for a given task based on capabilities
Design Principlesβ
- Backward Compatible: Existing TOML-based agent configs continue to work
- Progressive Enhancement: Persona features are optional and additive
- Performance First: Model selection prioritizes speed when appropriate
- Cost Aware: Budget tracking and enforcement prevent cost overruns
- Extensible: Architecture supports future enhancements
Enhanced Agent Metadata Schemaβ
YAML Frontmatter Formatβ
Agents can optionally include enhanced metadata in YAML frontmatter at the top of their prompt files:
---
agent_id: arch-agent
name: Architecture Agent
recommended_models:
primary: gemini-2.0-flash-thinking
fallback: gemini-2.0-flash-exp
premium: gemini-1.5-pro
capabilities: [architecture, design, planning, system-design]
performance_profile: thinking
cost_budget:
max_per_execution: 0.10
max_daily: 5.00
max_monthly: 100.00
---
Schema Definitionβ
agent_id (string, required)β
Unique identifier matching the agent's TOML config ID. Used for validation and linking.
name (string, required)β
Human-readable name matching the agent's TOML config name.
recommended_models (object, optional)β
Model recommendations with fallback chain:
primary(string, required): Primary model to use for this agentfallback(string, optional): Fallback model if primary is unavailablepremium(string, optional): Premium model for high-priority tasks
Model Selection Priority:
- Primary model (default)
- Fallback model (if primary unavailable)
- Premium model (if explicitly requested or primary/fallback unavailable)
- Mock model (for testing/development)
capabilities (array of strings, optional)β
List of agent capabilities for recommendation matching:
- Examples:
["architecture", "design", "planning", "code-generation", "testing", "documentation"] - Used by recommendation engine to match agents to tasks
- Case-insensitive matching
- Partial matching supported (e.g., "code" matches "code-generation")
performance_profile (string, optional)β
Performance profile that guides model selection:
speed: Prioritize fast responses (use fastest available model)balanced: Balance speed and quality (default)thinking: Prioritize quality and reasoning (use thinking models)expert: Maximum quality regardless of cost (use premium models)
Default: balanced if not specified
cost_budget (object, optional)β
Cost budget constraints:
max_per_execution(float, optional): Maximum cost per agent execution (in USD)max_daily(float, optional): Maximum daily cost for this agent (in USD)max_monthly(float, optional): Maximum monthly cost for this agent (in USD)
Budget Enforcement:
- Budgets are checked before model selection
- If budget exceeded, system falls back to cheaper models or mock
- Budgets reset at midnight (daily) and first of month (monthly)
Example Configurationsβ
Speed-Optimized Agentβ
---
agent_id: quick-responder
name: Quick Response Agent
recommended_models:
primary: gemini-2.0-flash-exp
fallback: gemini-1.5-flash
performance_profile: speed
cost_budget:
max_per_execution: 0.01
---
Quality-Optimized Agentβ
---
agent_id: expert-architect
name: Expert Architecture Agent
recommended_models:
primary: gemini-2.0-flash-thinking
fallback: gemini-1.5-pro
premium: gemini-1.5-pro
performance_profile: expert
capabilities: [architecture, system-design, technical-leadership]
cost_budget:
max_per_execution: 0.50
max_daily: 20.00
---
Model Selection Engineβ
Selection Algorithmβ
The model selection engine chooses the best model based on:
- Performance Profile: Determines priority (speed vs quality)
- Model Availability: Checks if model is available and accessible
- Cost Budget: Ensures selection doesn't exceed budget constraints
- Fallback Chain: Uses fallback models if primary is unavailable
Selection Processβ
1. Load agent configuration (TOML + YAML frontmatter if present)
2. Determine performance profile (from persona or default: balanced)
3. Check cost budgets (per-execution, daily, monthly)
4. Select model based on profile:
- speed: Use fastest available model from recommended_models
- balanced: Use primary model, fallback to fallback if needed
- thinking: Use thinking model (primary or premium)
- expert: Use premium model, fallback to primary if unavailable
5. Verify model availability (check API connectivity)
6. If unavailable, try fallback chain:
- primary β fallback β premium β mock
7. If budget exceeded, downgrade to cheaper model or use mock
8. Return selected model with metadata
Model Availability Checkingβ
Caching Strategy:
- Cache model availability status for 5 minutes
- Check availability on-demand if cache expired
- Use health check endpoints when available
- Fallback to trial execution for unknown models
Availability States:
- Available: Model is accessible and ready
- Unavailable: Model is down or rate-limited
- Unknown: Status not yet determined (will check on first use)
Fallback Chain Executionβ
Fallback chains execute in this order:
- Primary Model: Default choice
- Fallback Model: Used if primary unavailable
- Premium Model: Used if explicitly requested or both primary/fallback unavailable
- Mock Model: Used for testing or if all models unavailable
Fallback Triggers:
- Model API returns error (4xx, 5xx)
- Model rate-limited (429)
- Model timeout (> 30 seconds)
- Budget constraint violation
Cost Estimation and Budget Trackingβ
Cost Calculationβ
Costs are calculated based on:
- Input Tokens: Number of tokens in prompt
- Output Tokens: Number of tokens in response
- Model Pricing: Per-token pricing from model provider
- Execution Time: Optional time-based costs for premium models
Cost Formula:
cost = (input_tokens * input_price_per_token) + (output_tokens * output_price_per_token)
Budget Trackingβ
Storage:
- Budgets stored in
~/.radium/budgets/directory - One file per agent:
{agent_id}.json - Tracks daily and monthly spending
Budget File Format:
{
"agent_id": "arch-agent",
"daily": {
"date": "2025-12-07",
"spent": 2.45,
"limit": 5.00
},
"monthly": {
"month": "2025-12",
"spent": 45.20,
"limit": 100.00
},
"last_reset_daily": "2025-12-07T00:00:00Z",
"last_reset_monthly": "2025-12-01T00:00:00Z"
}
Budget Enforcementβ
Enforcement Policies:
-
Per-Execution Budget: Checked before each execution
- If exceeded, reject execution or downgrade model
- User can override with
--forceflag
-
Daily Budget: Checked before each execution
- If daily limit reached, reject execution
- Resets at midnight (local timezone)
-
Monthly Budget: Checked before each execution
- If monthly limit reached, reject execution
- Resets on first day of month
Budget Violation Handling:
- Reject: Return error, don't execute (default)
- Downgrade: Use cheaper model if available
- Mock: Use mock model for testing
- Warn: Execute but log warning (requires
--allow-budget-exceedflag)
Cost Reportingβ
CLI Command:
rad agents budget [agent-id] [--daily] [--monthly] [--reset]
Output Format:
Agent: arch-agent
Daily Budget: $2.45 / $5.00 (49% used)
Monthly Budget: $45.20 / $100.00 (45% used)
Last Reset: Daily: 2025-12-07, Monthly: 2025-12-01
Agent Recommendation Engineβ
Recommendation Algorithmβ
The recommendation engine suggests the best agent for a task based on:
- Capability Matching: Match task requirements to agent capabilities
- Performance Profile: Consider task complexity and performance needs
- Cost Constraints: Respect budget limitations
- Historical Performance: Learn from past agent performance (future enhancement)
Capability Matchingβ
Matching Process:
- Extract task requirements from user input or workflow context
- Tokenize and normalize requirements (lowercase, stem words)
- Match against agent capabilities (case-insensitive, partial matching)
- Score agents based on:
- Exact Match: +10 points
- Partial Match: +5 points
- Related Match: +2 points (semantic similarity, future enhancement)
Example:
Task: "Design a REST API for user management"
Requirements: [api, design, rest, user-management]
Agent Capabilities:
- api-design-agent: [api, design, rest, api-design] β Score: 30 (exact matches)
- arch-agent: [architecture, design, system-design] β Score: 5 (partial match)
- code-agent: [code, implementation] β Score: 0 (no match)
Scoring Systemβ
Score Components:
- Capability Match Score (0-100): Based on capability matching
- Performance Profile Match (0-20): Task complexity vs agent profile
- Cost Efficiency (0-10): Lower cost = higher score
- Availability (0-10): Model availability bonus
Final Score:
final_score = capability_score + profile_match + cost_efficiency + availability
Recommendation Threshold:
- Score >= 50: Recommended
- Score >= 30: Acceptable
- Score < 30: Not recommended
Recommendation APIβ
CLI Command:
rad agents recommend "Design a REST API" [--profile speed|balanced|thinking|expert]
Output:
Recommended Agents:
1. api-design-agent (Score: 85)
- Capabilities: api, design, rest, api-design
- Performance: balanced
- Estimated Cost: $0.05
2. arch-agent (Score: 45)
- Capabilities: architecture, design, system-design
- Performance: thinking
- Estimated Cost: $0.15
Backward Compatibility Strategyβ
Dual-Format Supportβ
The system supports both formats simultaneously:
- TOML-Only (Current): Agents with only TOML config work as before
- TOML + YAML (Enhanced): Agents with both formats get persona features
- YAML-Only (Future): Eventually support YAML-only configs
Default Valuesβ
When persona metadata is missing, system uses defaults:
- Performance Profile:
balanced - Recommended Models: Use TOML
engineandmodelfields - Capabilities: Empty array (no capability matching)
- Cost Budget: No limits (unlimited)
Migration Pathβ
Phase 1: Additive (Current)
- Persona features are optional
- Existing agents work unchanged
- New agents can use persona features
Phase 2: Enhancement (Future)
- Tools to add persona metadata to existing agents
- CLI command:
rad agents enhance <agent-id> - Auto-generate capabilities from agent descriptions
Phase 3: Deprecation (Future)
- TOML-only configs still supported but deprecated
- Migration guide provided
- Deprecation timeline: 2 major versions
Compatibility Matrixβ
| Feature | TOML-Only | TOML + YAML | YAML-Only |
|---|---|---|---|
| Basic Execution | β | β | β (future) |
| Model Selection | β (single) | β (with fallback) | β (with fallback) |
| Cost Tracking | β | β | β |
| Agent Recommendations | β | β | β |
| Performance Profiles | β | β | β |
Integration Pointsβ
AgentConfig Struct Changesβ
Current Structure:
pub struct AgentConfig {
pub id: String,
pub name: String,
pub description: String,
pub prompt_path: PathBuf,
pub engine: Option<String>,
pub model: Option<String>,
// ... other fields
}
Enhanced Structure (Additive):
pub struct AgentConfig {
// ... existing fields ...
// Persona fields (optional)
pub persona: Option<AgentPersona>,
}
pub struct AgentPersona {
pub recommended_models: RecommendedModels,
pub capabilities: Vec<String>,
pub performance_profile: PerformanceProfile,
pub cost_budget: Option<CostBudget>,
}
pub struct RecommendedModels {
pub primary: String,
pub fallback: Option<String>,
pub premium: Option<String>,
}
pub enum PerformanceProfile {
Speed,
Balanced,
Thinking,
Expert,
}
pub struct CostBudget {
pub max_per_execution: Option<f64>,
pub max_daily: Option<f64>,
pub max_monthly: Option<f64>,
}
ModelSelector Service Interfaceβ
pub trait ModelSelector {
/// Select the best model for an agent based on persona and context
fn select_model(
&self,
agent: &AgentConfig,
context: &ExecutionContext,
) -> Result<SelectedModel>;
/// Check model availability
fn check_availability(&self, model: &str) -> Result<AvailabilityStatus>;
/// Get model metadata (pricing, capabilities)
fn get_model_metadata(&self, model: &str) -> Result<ModelMetadata>;
}
CostTracker Service Interfaceβ
pub trait CostTracker {
/// Track cost for an agent execution
fn track_execution(
&self,
agent_id: &str,
model: &str,
cost: f64,
) -> Result<()>;
/// Check if budget allows execution
fn check_budget(
&self,
agent_id: &str,
estimated_cost: f64,
) -> Result<BudgetStatus>;
/// Get budget status for an agent
fn get_budget_status(&self, agent_id: &str) -> Result<BudgetStatus>;
/// Reset budget (daily or monthly)
fn reset_budget(&self, agent_id: &str, period: BudgetPeriod) -> Result<()>;
}
RecommendationEngine Service Interfaceβ
pub trait RecommendationEngine {
/// Recommend agents for a task
fn recommend_agents(
&self,
task_description: &str,
context: &RecommendationContext,
) -> Result<Vec<AgentRecommendation>>;
/// Get recommendation score for an agent
fn score_agent(
&self,
agent: &AgentConfig,
task_requirements: &[String],
) -> Result<f64>;
}
Implementation Roadmapβ
Phase 1: Foundation (Estimated: 2-3 weeks)β
Goals:
- YAML frontmatter parsing
- Basic persona metadata loading
- Dual-format support (TOML + YAML)
Deliverables:
- YAML parser for prompt files
- Persona metadata extraction
- AgentConfig persona field (optional)
- Unit tests for parsing
Dependencies:
- YAML parsing library (serde-yaml)
- Prompt file reading enhancement
Phase 2: Model Selection (Estimated: 2-3 weeks)β
Goals:
- Model selection engine
- Fallback chain execution
- Model availability checking
Deliverables:
- ModelSelector service implementation
- Fallback chain logic
- Model availability cache
- Integration with execution engine
Dependencies:
- Phase 1 complete
- Model metadata database
Phase 3: Cost Tracking (Estimated: 2-3 weeks)β
Goals:
- Cost calculation
- Budget tracking
- Budget enforcement
Deliverables:
- CostTracker service implementation
- Budget storage system
- Budget enforcement logic
- CLI budget commands
Dependencies:
- Phase 2 complete
- Model pricing data
Phase 4: Recommendations (Estimated: 2-3 weeks)β
Goals:
- Agent recommendation engine
- Capability matching
- Scoring system
Deliverables:
- RecommendationEngine service implementation
- Capability matching algorithm
- Scoring system
- CLI recommend command
Dependencies:
- Phase 1 complete (capabilities)
- Agent capability database
Phase 5: Integration & Polish (Estimated: 1-2 weeks)β
Goals:
- Full system integration
- Performance optimization
- Documentation
Deliverables:
- End-to-end integration tests
- Performance benchmarks
- User documentation
- Migration tools
Dependencies:
- All previous phases complete
Total Estimated Effortβ
Total: 9-14 weeks (2-3.5 months)
Risk Factors:
- Model pricing data availability
- Performance requirements
- Backward compatibility testing
- User adoption curve
Mitigation Strategies:
- Incremental rollout (feature flags)
- Comprehensive testing at each phase
- User feedback loops
- Fallback to current system if issues arise
Conclusionβ
The persona system architecture provides a comprehensive foundation for intelligent agent management, cost optimization, and enhanced user experience. The design prioritizes backward compatibility, progressive enhancement, and extensibility to support future requirements.
Key benefits:
- Intelligent Model Selection: Automatically choose optimal models
- Cost Control: Track and enforce budgets
- Better Recommendations: Match agents to tasks effectively
- Future-Proof: Extensible architecture for enhancements
The phased implementation approach allows for incremental delivery while maintaining system stability and user experience.