Troubleshooting Guide
Overviewβ
This guide helps you diagnose and resolve common issues when using self-hosted models with Radium. Issues are organized by symptom to help you quickly find solutions.
Quick Diagnostic Commandsβ
Check Model Server Statusβ
# Ollama
curl http://localhost:11434/api/tags
# vLLM
curl http://localhost:8000/v1/models
# LocalAI
curl http://localhost:8080/v1/models
Check Network Connectivityβ
# Test port accessibility
telnet localhost 11434 # Ollama
telnet localhost 8000 # vLLM
telnet localhost 8080 # LocalAI
# Check if port is in use
netstat -an | grep 11434
lsof -i :11434
Check Radium Configurationβ
# List agents
rad agents list
# Test agent execution
rad run <agent-id> "test"
Common Errorsβ
Connection Refusedβ
Error Message:
RequestError: Connection refused
RequestError: Network error: ... connection refused
Diagnosis:
-
Check if model server is running:
# Ollama
ps aux | grep ollama
docker ps | grep ollama
# vLLM
docker ps | grep vllm
# LocalAI
docker ps | grep localai -
Verify the port is correct:
# Ollama: 11434
# vLLM: 8000
# LocalAI: 8080 -
Check firewall settings:
# Linux
sudo ufw status
sudo iptables -L
# macOS
# Check System Preferences β Security & Privacy β Firewall
Solutions:
-
Start the model server:
# Ollama
ollama serve
# vLLM (Docker)
docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest --model <model>
# LocalAI (Docker)
docker-compose up -d -
Verify environment variables:
echo $UNIVERSAL_BASE_URL
# Should be: http://localhost:11434/v1 (Ollama)
# http://localhost:8000/v1 (vLLM)
# http://localhost:8080/v1 (LocalAI) -
Check base URL includes
/v1:# Correct
export UNIVERSAL_BASE_URL="http://localhost:11434/v1"
# Incorrect (missing /v1)
export UNIVERSAL_BASE_URL="http://localhost:11434" -
For remote servers, check network access:
ping <server-ip>
curl http://<server-ip>:11434/api/tags
Model Not Foundβ
Error Message:
ModelResponseError: Model not found
UnsupportedModelProvider: Model 'xxx' not available
Diagnosis:
-
Ollama - List available models:
ollama list -
vLLM - Check loaded models:
curl http://localhost:8000/v1/models -
LocalAI - Check configured models:
curl http://localhost:8080/v1/models
ls -la config/ # Check model config files
Solutions:
-
Download the model:
# Ollama
ollama pull llama3.2
# vLLM - Model loads automatically from Hugging Face
# Check server logs for download progress
# LocalAI - Install via gallery or manually
curl http://localhost:8080/models/apply -d '{"id": "ggml-gpt4all-j"}' -
Verify model name matches exactly:
# Case-sensitive, must match exactly
# Correct: llama3.2
# Incorrect: Llama3.2, llama-3.2, llama3 -
Check agent configuration:
[agent]
model = "llama3.2" # Must match model name on server
Timeout Errorsβ
Error Message:
RequestError: Request timeout
RequestError: Operation timed out
Diagnosis:
-
Check server response time:
time curl http://localhost:11434/v1/models -
Check hardware resources:
# CPU usage
top
htop
# Memory usage
free -h
# GPU usage (if applicable)
nvidia-smi -
Check model server logs for errors
Solutions:
-
Increase timeout (if configurable):
export UNIVERSAL_TIMEOUT=120 # Increase from default 60s -
Reduce request complexity:
- Use smaller
max_tokens - Reduce context length
- Use a faster/smaller model
- Use smaller
-
Optimize hardware:
- Ensure sufficient RAM/VRAM
- Use GPU if available
- Close other applications
-
Check for resource constraints:
# Check swap usage (indicates memory pressure)
swapon --show
free -h
Out of Memoryβ
Error Message:
ModelResponseError: Out of memory
RequestError: Insufficient memory
Diagnosis:
-
Check available memory:
# System memory
free -h
# GPU memory (if using GPU)
nvidia-smi -
Check model size vs available memory:
- 7B model: ~14GB RAM/VRAM
- 13B model: ~26GB RAM/VRAM
- 30B+ model: 40GB+ VRAM
Solutions:
-
Use a smaller model:
# Ollama - Use quantized model
ollama pull llama3.2 # ~2GB
# Instead of
ollama pull llama3.2:13b # ~7GB -
Reduce memory usage:
# vLLM - Reduce GPU memory utilization
vllm serve <model> --gpu-memory-utilization 0.7
# LocalAI - Reduce context size
# Edit config YAML: context_size: 2048 -
Close other applications to free memory
-
Use CPU-only inference (slower but uses less VRAM)
API Compatibility Issuesβ
Error Message:
SerializationError: Failed to parse response
ModelResponseError: Invalid API response format
Diagnosis:
-
Test API endpoint directly:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "test"}]
}' -
Check response format matches OpenAI spec
Solutions:
-
Verify endpoint path:
# Must include /v1
export UNIVERSAL_BASE_URL="http://localhost:11434/v1" -
Check server supports OpenAI API:
- Ollama: Requires OpenAI-compatible endpoint (available by default)
- vLLM: Native OpenAI compatibility
- LocalAI: Configured via model YAML
-
Update server version if using outdated software
Authentication Errorsβ
Error Message:
UnsupportedModelProvider: Authentication failed
RequestError: 401 Unauthorized
Diagnosis:
-
Check if server requires authentication:
# Most local servers don't require auth
curl http://localhost:11434/v1/models -
Verify API key if required:
echo $UNIVERSAL_API_KEY
Solutions:
-
Remove API key for local servers:
unset UNIVERSAL_API_KEY
# Or use without_auth() constructor -
Set correct API key if required:
export UNIVERSAL_API_KEY="your-api-key" -
Check server authentication settings
Performance Issuesβ
Slow Inferenceβ
Symptoms:
- Long response times
- Low tokens/second
Diagnosis:
-
Check hardware utilization:
# CPU
top
# GPU
nvidia-smi -l 1 -
Check model server logs for warnings
Solutions:
-
Use GPU if available:
# Ollama - Automatically uses GPU if detected
# vLLM - Requires GPU
# LocalAI - Configure gpu_layers in model config -
Use a faster model:
- Smaller models are faster
- Quantized models (Q4, Q8) are faster
-
Optimize server settings:
# vLLM - Increase batch size
vllm serve <model> --max-num-seqs 512
# LocalAI - Increase threads
# Edit config: threads: 8 -
Reduce context length if not needed
High Latencyβ
Symptoms:
- Long time to first token
- Slow initial response
Solutions:
-
Pre-warm the model:
# Make a test request to load model
curl http://localhost:11434/v1/chat/completions ... -
Use streaming for better perceived performance
-
Reduce model size for faster loading
Network Issuesβ
Cannot Access Remote Serverβ
Symptoms:
- Connection works locally but not remotely
- Timeout when accessing from another machine
Solutions:
-
Check server binding:
# Ollama - Bind to 0.0.0.0
OLLAMA_HOST=0.0.0.0:11434 ollama serve -
Configure firewall:
# Allow port in firewall
sudo ufw allow 11434 -
Check network connectivity:
ping <server-ip>
telnet <server-ip> 11434 -
Verify base URL:
# Use server IP or hostname
export UNIVERSAL_BASE_URL="http://192.168.1.100:11434/v1"
Diagnostic Decision Treeβ
Is the model server running?
ββ No β Start the server
ββ Yes β Can you connect to the endpoint?
ββ No β Check firewall/network
ββ Yes β Is the model available?
ββ No β Download/configure the model
ββ Yes β Check agent configuration
ββ Wrong model name β Fix model name
ββ Wrong endpoint β Fix base URL
ββ Other β Check logs
Log Analysisβ
Radium Logsβ
Location:
- Default:
logs/radium-core.log - Or check console output
What to look for:
- Connection errors
- Model creation failures
- Request/response details
Model Server Logsβ
Ollama:
# Check service logs
journalctl -u ollama -f
# Docker logs
docker logs ollama -f
vLLM:
# Docker logs
docker logs vllm -f
LocalAI:
# Docker logs
docker logs localai -f
Common Log Patternsβ
Connection Refused:
ERROR: Connection refused
ERROR: Failed to connect to http://localhost:11434
Model Not Found:
ERROR: Model 'xxx' not found
ERROR: Model not available
Timeout:
ERROR: Request timeout
ERROR: Operation timed out
Still Stuck?β
Additional Resourcesβ
-
Check Provider Documentation:
-
Review Setup Guides:
-
Check Configuration:
-
Community Support:
- GitHub Issues
- Discord/Slack (if available)
- Stack Overflow
Collecting Debug Informationβ
When seeking help, provide:
- Error message (exact text)
- Model server logs (last 50 lines)
- Radium logs (relevant sections)
- Configuration:
# Agent config
cat agents/my-agents/<agent>.toml
# Environment variables
env | grep UNIVERSAL - System information:
# OS
uname -a
# Docker (if used)
docker version
# GPU (if applicable)
nvidia-smi
Next Stepsβ
- Review Setup Guides for installation issues
- Check Configuration Guide for config problems
- See Advanced Configuration for optimization
- Explore Migration Guide for transition issues