Ollama Setup Guide
Overviewβ
Ollama is a local model runner that makes it easy to download and run open-source models. It's the fastest way to get started with self-hosted models in Radium, with setup taking approximately 5-10 minutes.
Prerequisitesβ
- macOS or Linux (Windows support via WSL)
- 8GB RAM minimum (16GB recommended)
- curl or wget for installation
- Docker (optional, for containerized deployment)
Installationβ
macOSβ
Option 1: Homebrew (Recommended)β
brew install ollama
Option 2: Direct Downloadβ
curl -fsSL https://ollama.com/install.sh | sh
Linuxβ
Installation Scriptβ
curl -fsSL https://ollama.com/install.sh | sh
This script will:
- Download the Ollama binary
- Install it to
/usr/local/bin/ollama - Create a systemd service (if systemd is available)
Manual Installationβ
-
Download the binary for your architecture:
# For x86_64
curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
chmod +x /usr/local/bin/ollama -
Start the Ollama service:
ollama serve
Dockerβ
For containerized deployment:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Note: Models will be stored in the Docker volume ollama.
Starting Ollamaβ
macOS / Linuxβ
After installation, start the Ollama service:
ollama serve
The service will run in the foreground. For production, you may want to run it as a background service or use systemd.
Systemd Service (Linux)β
Create /etc/systemd/system/ollama.service:
[Unit]
Description=Ollama Service
After=network.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable ollama
sudo systemctl start ollama
Dockerβ
If using Docker, the service starts automatically:
docker start ollama
Verifying Installationβ
Test that Ollama is running:
curl http://localhost:11434/api/tags
You should see a JSON response with available models (may be empty initially).
Model Managementβ
Downloading Modelsβ
Pull models using the ollama pull command:
# Popular models
ollama pull llama3.2 # Llama 3.2 (3B parameters, ~2GB)
ollama pull llama3.2:13b # Llama 3.2 13B (~7GB)
ollama pull codellama # CodeLlama (7B, optimized for code)
ollama pull mistral # Mistral 7B
ollama pull mixtral # Mixtral 8x7B MoE
Listing Modelsβ
View all downloaded models:
ollama list
Removing Modelsβ
Delete a model to free up disk space:
ollama rm llama3.2
Model Recommendationsβ
| Model | Size | Use Case | RAM Required |
|---|---|---|---|
llama3.2 | ~2GB | General purpose, fast | 8GB |
llama3.2:13b | ~7GB | Better quality | 16GB |
codellama | ~4GB | Code generation | 8GB |
mistral | ~4GB | Balanced performance | 8GB |
mixtral | ~26GB | High quality, reasoning | 32GB |
Hardware Requirementsβ
Minimum Requirementsβ
- CPU: Modern x86_64 or ARM64 processor
- RAM: 8GB (for 3B-7B models)
- Storage: 10GB free space (for models)
Recommended Requirementsβ
- CPU: Multi-core processor (4+ cores)
- RAM: 16GB+ (for 13B+ models)
- Storage: 50GB+ free space (for multiple models)
- GPU: Optional, but significantly improves performance (NVIDIA with CUDA support)
GPU Supportβ
Ollama automatically uses GPU if available:
# Check if GPU is detected
ollama ps
For NVIDIA GPUs, ensure CUDA drivers are installed. Ollama will use the GPU automatically if detected.
Configurationβ
Default Settingsβ
Ollama runs on:
- Host:
localhost - Port:
11434 - API Endpoint:
http://localhost:11434
Custom Portβ
To run on a different port:
OLLAMA_HOST=0.0.0.0:11435 ollama serve
Remote Accessβ
To allow remote access:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Security Note: Only enable remote access on trusted networks or with proper firewall rules.
Using with Radiumβ
Current Implementation Statusβ
β οΈ Important: While a native OllamaModel implementation exists in the Radium codebase, it is not yet integrated into the ModelFactory. Use the Universal provider as the recommended approach.
Configuration via Universal Providerβ
Ollama provides an OpenAI-compatible API endpoint. Configure Radium to use it:
Agent Configuration (TOML):
[agent]
id = "my-agent"
name = "My Agent"
description = "Agent using Ollama"
prompt_path = "prompts/agents/my-agent.md"
engine = "universal"
model = "llama3.2"
Environment Variables:
export UNIVERSAL_BASE_URL="http://localhost:11434/v1"
export UNIVERSAL_MODEL_ID="llama3.2"
Or in your agent configuration, you'll need to set the base URL through environment variables or configuration files that support it.
Testing the Connectionβ
Test that Radium can connect to Ollama:
# Using curl to test the OpenAI-compatible endpoint
curl http://localhost:11434/v1/models
You should see a list of available models in OpenAI format.
Example Agent Configurationβ
Create agents/my-agents/ollama-agent.toml:
[agent]
id = "ollama-agent"
name = "Ollama Agent"
description = "Agent using local Ollama model"
prompt_path = "prompts/agents/my-agents/ollama-agent.md"
engine = "universal"
model = "llama3.2"
[agent.persona.models]
primary = "llama3.2"
fallback = "llama3.2:13b"
Note: The exact configuration format may vary based on how Radium's engine system resolves Universal provider endpoints. Check the agent configuration guide for the latest patterns.
Troubleshootingβ
Connection Refusedβ
Problem: curl http://localhost:11434/api/tags returns connection refused.
Solutions:
- Ensure Ollama is running:
ollama serve - Check the port:
netstat -an | grep 11434 - Verify firewall settings
Model Not Foundβ
Problem: Model not available when making requests.
Solutions:
- Pull the model:
ollama pull llama3.2 - List available models:
ollama list - Verify model name matches exactly
Out of Memoryβ
Problem: Model fails to load or runs very slowly.
Solutions:
- Use a smaller model (e.g.,
llama3.2instead ofllama3.2:13b) - Close other applications to free RAM
- Consider using a model with quantization (smaller memory footprint)
Slow Performanceβ
Problem: Model inference is slow.
Solutions:
- Use GPU if available (Ollama detects automatically)
- Use a smaller/faster model
- Reduce
max_tokensin generation parameters - Ensure sufficient RAM (swap usage indicates memory pressure)
Next Stepsβ
- Configure Your Agent: See the agent configuration guide
- Test Your Setup: Run a simple agent execution to verify connectivity
- Explore Models: Try different models to find the best fit for your use case
- Optimize Performance: Tune model parameters and hardware configuration