Ollama Setup Guide

Overview

Ollama is a local model runner that makes it easy to download and run open-source models. It's the fastest way to get started with self-hosted models in Radium, with setup taking approximately 5-10 minutes.

Prerequisites

macOS or Linux (Windows support via WSL)
8GB RAM minimum (16GB recommended)
curl or wget for installation
Docker (optional, for containerized deployment)

Installation

macOS

Option 1: Homebrew (Recommended)

brew install ollama

Option 2: Direct Download

curl -fsSL https://ollama.com/install.sh | sh

Linux

Installation Script

curl -fsSL https://ollama.com/install.sh | sh

This script will:

Download the Ollama binary
Install it to /usr/local/bin/ollama
Create a systemd service (if systemd is available)

Manual Installation

Download the binary for your architecture:

# For x86_64
curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
chmod +x /usr/local/bin/ollama

Start the Ollama service:
```
ollama serve
```

Docker

For containerized deployment:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Note: Models will be stored in the Docker volume ollama.

Starting Ollama

macOS / Linux

After installation, start the Ollama service:

ollama serve

The service will run in the foreground. For production, you may want to run it as a background service or use systemd.

Systemd Service (Linux)

Create /etc/systemd/system/ollama.service:

[Unit]
Description=Ollama Service
After=network.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable ollama
sudo systemctl start ollama

Docker

If using Docker, the service starts automatically:

docker start ollama

Verifying Installation

Test that Ollama is running:

curl http://localhost:11434/api/tags

You should see a JSON response with available models (may be empty initially).

Model Management

Downloading Models

Pull models using the ollama pull command:

# Popular models
ollama pull llama3.2          # Llama 3.2 (3B parameters, ~2GB)
ollama pull llama3.2:13b      # Llama 3.2 13B (~7GB)
ollama pull codellama        # CodeLlama (7B, optimized for code)
ollama pull mistral          # Mistral 7B
ollama pull mixtral          # Mixtral 8x7B MoE

Listing Models

View all downloaded models:

ollama list

Removing Models

Delete a model to free up disk space:

ollama rm llama3.2

Model Recommendations

Model	Size	Use Case	RAM Required
`llama3.2`	~2GB	General purpose, fast	8GB
`llama3.2:13b`	~7GB	Better quality	16GB
`codellama`	~4GB	Code generation	8GB
`mistral`	~4GB	Balanced performance	8GB
`mixtral`	~26GB	High quality, reasoning	32GB

Hardware Requirements

Minimum Requirements

CPU: Modern x86_64 or ARM64 processor
RAM: 8GB (for 3B-7B models)
Storage: 10GB free space (for models)

Recommended Requirements

CPU: Multi-core processor (4+ cores)
RAM: 16GB+ (for 13B+ models)
Storage: 50GB+ free space (for multiple models)
GPU: Optional, but significantly improves performance (NVIDIA with CUDA support)

GPU Support

Ollama automatically uses GPU if available:

# Check if GPU is detected
ollama ps

For NVIDIA GPUs, ensure CUDA drivers are installed. Ollama will use the GPU automatically if detected.

Configuration

Default Settings

Ollama runs on:

Host: localhost
Port: 11434
API Endpoint: http://localhost:11434

Custom Port

To run on a different port:

OLLAMA_HOST=0.0.0.0:11435 ollama serve

Remote Access

To allow remote access:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Security Note: Only enable remote access on trusted networks or with proper firewall rules.

Using with Radium

Current Implementation Status

⚠️ Important: While a native OllamaModel implementation exists in the Radium codebase, it is not yet integrated into the ModelFactory. Use the Universal provider as the recommended approach.

Configuration via Universal Provider

Ollama provides an OpenAI-compatible API endpoint. Configure Radium to use it:

Agent Configuration (TOML):

[agent]
id = "my-agent"
name = "My Agent"
description = "Agent using Ollama"
prompt_path = "prompts/agents/my-agent.md"
engine = "universal"
model = "llama3.2"

Environment Variables:

export UNIVERSAL_BASE_URL="http://localhost:11434/v1"
export UNIVERSAL_MODEL_ID="llama3.2"

Or in your agent configuration, you'll need to set the base URL through environment variables or configuration files that support it.

Testing the Connection

Test that Radium can connect to Ollama:

# Using curl to test the OpenAI-compatible endpoint
curl http://localhost:11434/v1/models

You should see a list of available models in OpenAI format.

Example Agent Configuration

Create agents/my-agents/ollama-agent.toml:

[agent]
id = "ollama-agent"
name = "Ollama Agent"
description = "Agent using local Ollama model"
prompt_path = "prompts/agents/my-agents/ollama-agent.md"
engine = "universal"
model = "llama3.2"

[agent.persona.models]
primary = "llama3.2"
fallback = "llama3.2:13b"

Note: The exact configuration format may vary based on how Radium's engine system resolves Universal provider endpoints. Check the agent configuration guide for the latest patterns.

Troubleshooting

Connection Refused

Problem: curl http://localhost:11434/api/tags returns connection refused.

Solutions:

Ensure Ollama is running: ollama serve
Check the port: netstat -an | grep 11434
Verify firewall settings

Model Not Found

Problem: Model not available when making requests.

Solutions:

Pull the model: ollama pull llama3.2
List available models: ollama list
Verify model name matches exactly

Out of Memory

Problem: Model fails to load or runs very slowly.

Solutions:

Use a smaller model (e.g., llama3.2 instead of llama3.2:13b)
Close other applications to free RAM
Consider using a model with quantization (smaller memory footprint)

Slow Performance

Problem: Model inference is slow.

Solutions:

Use GPU if available (Ollama detects automatically)
Use a smaller/faster model
Reduce max_tokens in generation parameters
Ensure sufficient RAM (swap usage indicates memory pressure)

Next Steps

Configure Your Agent: See the agent configuration guide
Test Your Setup: Run a simple agent execution to verify connectivity
Explore Models: Try different models to find the best fit for your use case
Optimize Performance: Tune model parameters and hardware configuration

Overview​

Prerequisites​

Installation​

macOS​

Option 1: Homebrew (Recommended)​

Option 2: Direct Download​

Linux​

Installation Script​

Manual Installation​

Docker​

Starting Ollama​

macOS / Linux​

Systemd Service (Linux)​

Docker​

Verifying Installation​

Model Management​

Downloading Models​

Listing Models​

Removing Models​

Model Recommendations​

Hardware Requirements​

Minimum Requirements​

Recommended Requirements​

GPU Support​

Configuration​

Default Settings​

Custom Port​

Remote Access​

Using with Radium​

Current Implementation Status​

Configuration via Universal Provider​

Testing the Connection​

Example Agent Configuration​

Troubleshooting​

Connection Refused​

Model Not Found​

Out of Memory​

Slow Performance​

Next Steps​

Additional Resources​

Overview

Prerequisites

Installation

macOS

Option 1: Homebrew (Recommended)

Option 2: Direct Download

Linux

Installation Script

Manual Installation

Docker

Starting Ollama

macOS / Linux

Systemd Service (Linux)

Docker

Verifying Installation

Model Management

Downloading Models

Listing Models

Removing Models

Model Recommendations

Hardware Requirements

Minimum Requirements

Recommended Requirements

GPU Support

Configuration

Default Settings

Custom Port

Remote Access

Using with Radium

Current Implementation Status

Configuration via Universal Provider

Testing the Connection

Example Agent Configuration

Troubleshooting

Connection Refused

Model Not Found

Out of Memory

Slow Performance

Next Steps

Additional Resources