Agent Configuration for Self-Hosted Models
Overviewβ
This guide explains how to configure Radium agents to use self-hosted model providers (Ollama, vLLM, LocalAI) through the Universal provider. All self-hosted models are accessed via the Universal provider, which implements the OpenAI Chat Completions API specification.
Basic Configurationβ
Universal Provider Setupβ
All self-hosted models use the universal engine type. The configuration requires:
- Engine: Set to
"universal" - Model: The model identifier (e.g.,
"llama3.2","meta-llama/Llama-3-8B-Instruct") - Base URL: Configured via environment variables or system configuration
Minimal Configurationβ
[agent]
id = "self-hosted-agent"
name = "Self-Hosted Agent"
description = "Agent using local Ollama model"
prompt_path = "prompts/agents/my-agents/self-hosted-agent.md"
engine = "universal"
model = "llama3.2"
Note: The base URL for the Universal provider must be configured via environment variables or system settings. See Environment Variables below.
Provider-Specific Configurationsβ
Ollama Configurationβ
Agent TOML:
[agent]
id = "ollama-agent"
name = "Ollama Agent"
description = "Agent using local Ollama model"
prompt_path = "prompts/agents/my-agents/ollama-agent.md"
engine = "universal"
model = "llama3.2"
Environment Variables:
export UNIVERSAL_BASE_URL="http://localhost:11434/v1"
export UNIVERSAL_MODEL_ID="llama3.2"
Full Example:
[agent]
id = "ollama-agent"
name = "Ollama Agent"
description = "Agent using local Ollama model"
prompt_path = "prompts/agents/my-agents/ollama-agent.md"
engine = "universal"
model = "llama3.2"
reasoning_effort = "medium"
[agent.persona.models]
primary = "llama3.2"
fallback = "llama3.2:13b"
vLLM Configurationβ
Agent TOML:
[agent]
id = "vllm-agent"
name = "vLLM Agent"
description = "Agent using vLLM for high-performance inference"
prompt_path = "prompts/agents/my-agents/vllm-agent.md"
engine = "universal"
model = "meta-llama/Llama-3-8B-Instruct"
Environment Variables:
export UNIVERSAL_BASE_URL="http://localhost:8000/v1"
export UNIVERSAL_MODEL_ID="meta-llama/Llama-3-8B-Instruct"
Full Example:
[agent]
id = "vllm-agent"
name = "vLLM Agent"
description = "High-performance agent using vLLM"
prompt_path = "prompts/agents/my-agents/vllm-agent.md"
engine = "universal"
model = "meta-llama/Llama-3-8B-Instruct"
reasoning_effort = "high"
[agent.persona.models]
primary = "meta-llama/Llama-3-8B-Instruct"
fallback = "meta-llama/Llama-3-70B-Instruct"
premium = "meta-llama/Llama-3-70B-Instruct"
LocalAI Configurationβ
Agent TOML:
[agent]
id = "localai-agent"
name = "LocalAI Agent"
description = "Agent using LocalAI for flexible inference"
prompt_path = "prompts/agents/my-agents/localai-agent.md"
engine = "universal"
model = "gpt-3.5-turbo"
Environment Variables:
export UNIVERSAL_BASE_URL="http://localhost:8080/v1"
export UNIVERSAL_MODEL_ID="gpt-3.5-turbo"
Full Example:
[agent]
id = "localai-agent"
name = "LocalAI Agent"
description = "Flexible agent using LocalAI"
prompt_path = "prompts/agents/my-agents/localai-agent.md"
engine = "universal"
model = "gpt-3.5-turbo"
reasoning_effort = "medium"
[agent.persona.models]
primary = "gpt-3.5-turbo"
fallback = "gpt-4"
Environment Variablesβ
Universal Provider Variablesβ
The Universal provider uses these environment variables:
| Variable | Description | Example |
|---|---|---|
UNIVERSAL_BASE_URL | Base URL for the API endpoint | http://localhost:11434/v1 |
UNIVERSAL_MODEL_ID | Default model ID (optional) | llama3.2 |
UNIVERSAL_API_KEY | API key (if required) | your-api-key |
OPENAI_COMPATIBLE_API_KEY | Alternative API key variable | your-api-key |
Provider-Specific Variablesβ
Some providers may use provider-specific environment variables:
| Provider | Variable | Default | Description |
|---|---|---|---|
| Ollama | OLLAMA_HOST | localhost:11434 | Ollama server address |
| vLLM | VLLM_ENDPOINT | http://localhost:8000/v1 | vLLM API endpoint |
| LocalAI | LOCALAI_ENDPOINT | http://localhost:8080/v1 | LocalAI API endpoint |
Note: These provider-specific variables may be used by the engine system to automatically configure the Universal provider. Check your Radium configuration for details.
Multi-Tier Model Strategyβ
Radium supports a multi-tier model strategy with primary, fallback, and premium models. This is useful for self-hosted models where you want to:
- Use a fast local model as primary
- Fall back to a more capable model if needed
- Use a premium cloud model for critical tasks
Example: Local Primary with Cloud Fallbackβ
[agent]
id = "hybrid-agent"
name = "Hybrid Agent"
description = "Agent with local primary and cloud fallback"
prompt_path = "prompts/agents/my-agents/hybrid-agent.md"
engine = "universal"
model = "llama3.2"
[agent.persona.models]
primary = "llama3.2" # Local Ollama (fast, free)
fallback = "gpt-4o-mini" # Cloud OpenAI (reliable)
premium = "gpt-4o" # Cloud OpenAI (best quality)
Example: Multiple Local Modelsβ
[agent]
id = "local-tier-agent"
name = "Local Tier Agent"
description = "Agent with multiple local model tiers"
prompt_path = "prompts/agents/my-agents/local-tier-agent.md"
engine = "universal"
model = "llama3.2"
[agent.persona.models]
primary = "llama3.2" # Fast 3B model
fallback = "llama3.2:13b" # Better 13B model
premium = "mixtral" # Best quality (if available)
Mixed Cloud and Self-Hostedβ
You can configure agents to use a mix of cloud and self-hosted models:
[agent]
id = "mixed-agent"
name = "Mixed Agent"
description = "Agent mixing cloud and self-hosted models"
prompt_path = "prompts/agents/my-agents/mixed-agent.md"
engine = "gemini" # Default to cloud
model = "gemini-2.0-flash-exp"
[agent.persona.models]
primary = "llama3.2" # Self-hosted (Ollama)
fallback = "gemini-2.0-flash-exp" # Cloud (Gemini)
premium = "gpt-4o" # Cloud (OpenAI)
Note: When mixing providers, ensure the engine system can resolve models from different providers. The multi-tier strategy will attempt to use models in order: primary β fallback β premium.
Model Parametersβ
Reasoning Effortβ
Control the reasoning effort level:
[agent]
reasoning_effort = "low" # Fast, less thorough
# reasoning_effort = "medium" # Balanced (default)
# reasoning_effort = "high" # Slow, more thorough
Performance Profileβ
Configure performance characteristics:
[agent.persona.performance]
profile = "balanced" # speed, balanced, thinking, expert
estimated_tokens = 4000
Configuration Validationβ
Testing Your Configurationβ
-
Verify Model Server is Running:
# Ollama
curl http://localhost:11434/api/tags
# vLLM
curl http://localhost:8000/v1/models
# LocalAI
curl http://localhost:8080/v1/models -
Test Agent Discovery:
rad agents list -
Test Agent Execution:
rad run ollama-agent "Test prompt"
Common Configuration Issuesβ
Issue: Agent can't connect to model server
- Solution: Verify environment variables are set correctly
- Solution: Check model server is running and accessible
- Solution: Verify base URL includes
/v1path
Issue: Model not found
- Solution: Verify model name matches exactly (case-sensitive)
- Solution: Check model is available on the server
- Solution: For Ollama, run
ollama listto see available models
Issue: Authentication errors
- Solution: Most local servers don't require API keys
- Solution: If using
UNIVERSAL_API_KEY, ensure it's correct - Solution: Try removing API key for local servers
Advanced Configurationβ
Custom Endpointsβ
For remote or custom-configured servers:
# Remote Ollama server
export UNIVERSAL_BASE_URL="http://192.168.1.100:11434/v1"
# Custom vLLM endpoint
export UNIVERSAL_BASE_URL="http://vllm.example.com:8000/v1"
# LocalAI with custom port
export UNIVERSAL_BASE_URL="http://localhost:9090/v1"
Multiple Agents with Different Modelsβ
Configure different agents to use different self-hosted models:
# agents/my-agents/fast-agent.toml
[agent]
id = "fast-agent"
engine = "universal"
model = "llama3.2" # Fast 3B model
# agents/my-agents/quality-agent.toml
[agent]
id = "quality-agent"
engine = "universal"
model = "llama3.2:13b" # Better 13B model
Next Stepsβ
- See Configuration Examples for more detailed examples
- Check Troubleshooting Guide for common issues
- Review Advanced Configuration for production setups
- Explore Setup Guides for provider-specific installation