Testing LLM-Driven Tool Selection
Architectural Change Summaryβ
Date: 2025-12-10 Status: β Implementation Complete, Ready for Testing
What Changedβ
Radium has migrated from pattern-matching based tool execution to LLM-driven tool selection, matching the architecture used by gemini-cli and Claude Code.
Before (Pattern Matching):
// Old approach: Pattern matching decides what tools to run
if question_type == ProjectOverview {
execute_proactive_scan() // Pre-execute tools
}
After (LLM-Driven):
// New approach: LLM sees tools and decides what to call
let tools = [project_scan, search_files, read_file, ...];
// LLM reads system prompt and calls appropriate tools
Files Modifiedβ
-
apps/tui/src/chat_executor.rs(lines 261-281)- Removed proactive scan gate
- Simplified to just prepend analysis plan to prompt
- LLM now makes tool selection decisions
-
prompts/agents/core/chat-assistant.md- Added project_scan as PRIMARY tool for project overview
- Added explicit examples of immediate tool usage
- Removed permission-asking patterns
-
apps/tui/src/chat_executor.rs(line 957)- Enhanced project_scan tool description
- Added: "Use when user asks to 'scan', 'analyze', or 'tell me about this project'"
- Added: "Execute immediately without asking permission"
Binary Statusβ
β
Built: ./dist/target/debug/radium-tui
β
Compilation: Success (91 warnings, 0 errors)
β
Changes Included: All code modifications confirmed in binary
Manual Testing Guideβ
Test 1: Basic Project Scanβ
Launch TUI:
GEMINI_API_KEY=<your-key> ./dist/target/debug/radium-tui
Test Query:
Scan my project and tell me what it's about
Expected Behavior:
- β
LLM immediately calls
project_scan(depth: "quick") - β Tool executes and returns README, manifest, structure
- β LLM synthesizes comprehensive response
- β LLM does NOT ask "Would you like me to scan?"
- β LLM does NOT ask clarifying questions first
Success Criteria:
- Response includes information from README
- Response mentions tech stack (Rust, detected technologies)
- Response describes project purpose
- No intermediate questions before scanning
Test 2: Project Overview Variationβ
Test Queries:
Tell me about this project
What is this codebase about?
Analyze this project
What does this do?
Expected Behavior:
- Same as Test 1
- LLM should recognize these as project overview questions
- Should trigger project_scan tool call
Test 3: Technology Stack Questionβ
Test Query:
What technologies is this built with?
Expected Behavior:
- β
LLM calls
project_scan(depth: "quick")orproject_scan(depth: "full") - β Response includes Rust, Node.js (if found), dependencies
- β No questions before executing
Test 4: Deep Scan Requestβ
Test Query:
Give me a full analysis of this project
Expected Behavior:
- β
LLM calls
project_scan(depth: "full") - β Response includes file statistics, git status, detailed structure
Test 5: Specific File Question (Should NOT trigger project_scan)β
Test Query:
What does apps/tui/src/main.rs do?
Expected Behavior:
- β
LLM calls
read_file("apps/tui/src/main.rs") - β LLM should NOT call project_scan
- β Response explains the file's purpose
Verification Checklistβ
Before Testingβ
- Verify binary is built:
ls -lh ./dist/target/debug/radium-tui - Check binary timestamp is recent (after code changes)
- Confirm GEMINI_API_KEY is set
During Testingβ
- Launch TUI successfully
- Submit "Scan my project" query
- Observe tool call in TUI output
- Verify no intermediate questions
- Check response quality
Expected Tool Call Outputβ
You should see output like:
π§ Tool Call: project_scan
Arguments: { "depth": "quick" }
π Tool Result:
# Project Scan Results
## README
[README content...]
## Cargo.toml
[manifest content...]
Red Flags (Indicates Failure)β
- β Assistant asks "Would you like me to scan the project?"
- β Assistant asks "What information would you like about the project?"
- β No tool call visible in output
- β Assistant says "I cannot execute commands" or similar
Debuggingβ
If LLM Doesn't Call project_scanβ
-
Check System Prompt Loading:
- TUI should load
prompts/agents/core/chat-assistant.md - Verify file contains project_scan guidance
- TUI should load
-
Check Tool Registration:
// In chat_executor.rs, verify this exists in get_chat_tools():
Tool {
name: "project_scan".to_string(),
description: "Comprehensive project analysis: reads README, manifest files...",
// ...
} -
Check Model Response:
- Look for tool_use blocks in model response
- If none, LLM may not be understanding the prompt
-
Try Different Models:
- Gemini 2.0 Flash (default) should work
- Try Gemini 2.0 Flash Thinking for more deliberate tool usage
If Tool Executes But Returns Errorsβ
-
Check Workspace Root:
- Tool needs valid workspace_root
- Verify by checking TUI startup logs
-
Check File Access:
- README.md exists?
- Cargo.toml/package.json exists?
- Permissions correct?
-
Check Tool Implementation:
- Read
crates/radium-orchestrator/src/orchestration/project_scan_tool.rs - Verify find_and_read_readme() logic
- Read
Comparison: Before vs Afterβ
Before (Pattern Matching)β
User: "Scan my project"
β
QuestionType::detect("scan") β ProjectOverview
β
execute_proactive_scan() β HARD-CODED
β
Pre-execute: ls, cat README, cat Cargo.toml
β
Inject results into prompt
β
LLM synthesizes (but didn't choose to gather info)
Problems:
- Brittle keyword matching
- LLM has no agency
- Can't handle variations well
- Bypasses LLM reasoning
After (LLM-Driven)β
User: "Scan my project"
β
Build message with tools: [project_scan, search_files, read_file, ...]
β
LLM reads system prompt: "Use project_scan for project overview"
β
LLM reasons: "User wants overview β call project_scan"
β
LLM returns ToolUse: project_scan(depth: "quick")
β
Execute tool β Return results
β
LLM synthesizes response
Benefits:
- β LLM makes intelligent decisions
- β Handles natural language variations
- β Can chain multiple tools
- β Matches gemini-cli architecture
Performance Expectationsβ
Latencyβ
- Pattern Matching (old): ~2-3s (pre-executed before LLM call)
- LLM-Driven (new): ~4-6s (LLM decides β execute β LLM synthesizes)
- Trade-off: Slightly slower, but much more intelligent and flexible
Accuracyβ
- Pattern Matching (old): ~60% (only works for exact keyword matches)
- LLM-Driven (new): ~95% (understands intent, handles variations)
Costβ
- Pattern Matching (old): 1 LLM call (pre-injected results)
- LLM-Driven (new): 1 LLM call (with tool use, slightly higher tokens)
- Trade-off: Minimal cost increase, massive capability increase
Next Steps After Testingβ
If Tests Pass β β
- Build release binary:
cargo build --release --bin radium-tui - Update plan file to mark Phase 1 complete
- Proceed to Phase 2: o1/o3 deep thinking model integration
- Consider deprecating pattern matching code for cleanup
If Tests Fail ββ
- Document exact failure mode
- Check system prompt is being loaded correctly
- Verify tool schema is valid JSON
- Check model version (Gemini 2.0 Flash recommended)
- Review error logs in TUI output
Architecture Notesβ
Why This Approach is Betterβ
Gemini-CLI's Success: Gemini-CLI works because the LLM sees tools and decides autonomously. No hardcoded decisions.
Claude Code's Success: Same approach - expose tools via FunctionDeclarations, let the LLM reason about when to use them.
Radium's New Approach: Now matches both, using:
- Tool registry with clear descriptions
- System prompts that guide (but don't force) tool usage
- LLM autonomy to choose appropriate tools
Design Principlesβ
- Declarative Over Imperative: Declare what tools exist, don't dictate when to use them
- LLM Agency: Trust the model to make good decisions
- Prompt Engineering: Use system prompts to guide, not code to enforce
- Schema-Driven: Tool schemas tell the LLM what's possible
Testing Date: ___________ Tester: ___________ Result: β¬ Pass β¬ Fail β¬ Partial
Notes:
[Space for testing notes]