Universal OpenAI-Compatible Provider Guide

Introduction

The Universal OpenAI-Compatible Provider enables Radium to connect to any server that implements the OpenAI Chat Completions API specification. This includes popular local inference servers like vLLM, LocalAI, LM Studio, and Ollama, allowing you to:

Run models locally for privacy and cost savings
Use self-hosted inference servers
Experiment with open-source models
Avoid vendor lock-in

Quick Start

The simplest way to get started is with a local server that doesn't require authentication:

use radium_models::UniversalModel;

// Connect to LM Studio (no authentication required)
let model = UniversalModel::without_auth(
    "llama-2-7b".to_string(),
    "http://localhost:1234/v1".to_string(),
);

// Generate text
let response = model.generate_text("Say hello", None).await?;
println!("{}", response.content);

Supported Servers

The Universal provider works with any server implementing the OpenAI Chat Completions API, including:

vLLM: High-performance LLM inference server
LocalAI: Local inference server with OpenAI-compatible API
LM Studio: Desktop app for running local models
Ollama: Local model runner with OpenAI-compatible endpoints
Any custom server: As long as it implements the OpenAI API spec

Server Setup Guides

vLLM

vLLM is a high-performance inference server optimized for large language models.

Installation

pip install vllm

Starting the Server

# Serve a model (replace with your model name)
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000

# With API key authentication (optional)
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000 --api-key your-api-key

Usage with Radium

use radium_models::UniversalModel;

// Without authentication
let model = UniversalModel::without_auth(
    "meta-llama/Llama-3-8B-Instruct".to_string(),
    "http://localhost:8000/v1".to_string(),
);

// With authentication
let model = UniversalModel::with_api_key(
    "meta-llama/Llama-3-8B-Instruct".to_string(),
    "http://localhost:8000/v1".to_string(),
    "your-api-key".to_string(),
);

API Endpoint

Default: http://localhost:8000/v1
Chat completions: http://localhost:8000/v1/chat/completions

LocalAI

LocalAI is a drop-in replacement for OpenAI that runs locally using consumer-grade hardware.

Installation (Docker)

docker run -p 8080:8080 localai/localai

Configuration

LocalAI uses YAML configuration files. Create a models.yaml file:

models:
  - name: gpt-3.5-turbo
    backend: llama
    parameters:
      model: /path/to/model.gguf

Usage with Radium

use radium_models::UniversalModel;

// Without authentication (default)
let model = UniversalModel::without_auth(
    "gpt-3.5-turbo".to_string(),
    "http://localhost:8080/v1".to_string(),
);

// With authentication (if configured)
let model = UniversalModel::with_api_key(
    "gpt-3.5-turbo".to_string(),
    "http://localhost:8080/v1".to_string(),
    "local-api-key".to_string(),
);

API Endpoint

Default: http://localhost:8080/v1
Chat completions: http://localhost:8080/v1/chat/completions

LM Studio

LM Studio is a user-friendly desktop application for running local models.

Installation

Download LM Studio from lmstudio.ai
Install and launch the application
Download a model through the UI

Enabling the Local Server

Open LM Studio
Go to Settings → Local Server
Enable "Local Server"
Note the port (default: 1234)

Usage with Radium

use radium_models::UniversalModel;

// LM Studio doesn't require authentication
let model = UniversalModel::without_auth(
    "llama-2-7b".to_string(),  // Use the model name from LM Studio
    "http://localhost:1234/v1".to_string(),
);

let response = model.generate_text("Say hello", None).await?;

API Endpoint

Default: http://localhost:1234/v1
Chat completions: http://localhost:1234/v1/chat/completions

Ollama

Ollama is a simple tool for running large language models locally.

Installation

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from https://ollama.com

Pulling a Model

ollama pull llama2

Usage with Radium

use radium_models::UniversalModel;

// Ollama uses OpenAI-compatible endpoints
let model = UniversalModel::without_auth(
    "llama2".to_string(),
    "http://localhost:11434/v1".to_string(),
);

API Endpoint

Default: http://localhost:11434/v1
Chat completions: http://localhost:11434/v1/chat/completions

Constructor Patterns

The Universal provider supports three constructor patterns:

1. `new()` - Environment Variable Authentication

Loads API key from environment variables:

// Set environment variable
std::env::set_var("UNIVERSAL_API_KEY", "your-api-key");

// Or use OPENAI_COMPATIBLE_API_KEY
std::env::set_var("OPENAI_COMPATIBLE_API_KEY", "your-api-key");

let model = UniversalModel::new(
    "model-name".to_string(),
    "http://localhost:8000/v1".to_string(),
)?;

2. `with_api_key()` - Explicit API Key

Provides API key directly:

let model = UniversalModel::with_api_key(
    "model-name".to_string(),
    "http://localhost:8000/v1".to_string(),
    "your-api-key".to_string(),
);

3. `without_auth()` - No Authentication

For servers that don't require authentication:

let model = UniversalModel::without_auth(
    "model-name".to_string(),
    "http://localhost:1234/v1".to_string(),
);

Environment Variables

The Universal provider supports the following environment variables:

UNIVERSAL_API_KEY: Primary API key environment variable
OPENAI_COMPATIBLE_API_KEY: Fallback API key environment variable
UNIVERSAL_BASE_URL: Default base URL (not currently used, specify in constructor)

Factory Integration

You can also create Universal models through the ModelFactory:

use radium_models::{ModelConfig, ModelFactory, ModelType};

let config = ModelConfig::new(
    ModelType::Universal,
    "model-name".to_string(),
)
.with_base_url("http://localhost:8000/v1".to_string())
.with_api_key("your-api-key".to_string());  // Optional

let model = ModelFactory::create(config)?;

Streaming Support

The Universal provider supports Server-Sent Events (SSE) streaming:

use futures::StreamExt;
use radium_models::UniversalModel;
use radium_abstraction::ChatMessage;

let model = UniversalModel::without_auth(
    "model-name".to_string(),
    "http://localhost:8000/v1".to_string(),
);

let messages = vec![ChatMessage {
    role: "user".to_string(),
    content: "Tell me a story".to_string(),
}];

let mut stream = model.generate_chat_completion_stream(&messages, None).await?;

while let Some(result) = stream.next().await {
    let content = result?;
    print!("{}", content);
}

Troubleshooting

Connection Refused Errors

Error: Network error: Connection refused

Solutions:

Verify the server is running: curl http://localhost:8000/v1/models
Check the port number matches your server configuration
Ensure the base URL includes /v1 suffix
Check firewall settings

Authentication Failures

Error: Authentication failed (401)

Solutions:

Verify your API key is correct
Check if the server requires authentication (some don't)
Use without_auth() for servers that don't require keys
Check server logs for authentication errors

Timeout Issues

Error: Request timeout

Solutions:

The default timeout is 60 seconds
Large models or slow hardware may need more time
Check server logs for processing delays
Consider using a faster model or hardware

Malformed Response Errors

Error: Failed to parse response

Solutions:

Verify the server implements the OpenAI API specification correctly
Check server logs for error responses

Test the server directly with curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"test","messages":[{"role":"user","content":"hello"}]}'

Empty or Missing Content

Error: No content in API response

Solutions:

Check that the model is loaded and ready
Verify the model name matches what the server expects
Check server logs for generation errors
Try a simpler prompt to test basic functionality

Common Configuration Mistakes

Missing /v1 suffix: Base URL should be http://localhost:8000/v1, not http://localhost:8000
Wrong model name: Use the exact model identifier the server expects
Port mismatch: Verify the port matches your server configuration
Authentication when not needed: Some servers (LM Studio, Ollama) don't require API keys

Migration from OpenAI Provider

If you're currently using the OpenAI provider and want to switch to Universal:

Before (OpenAI)

use radium_models::OpenAIModel;

let model = OpenAIModel::new("gpt-4".to_string())?;

After (Universal with OpenAI)

use radium_models::UniversalModel;

let model = UniversalModel::with_api_key(
    "gpt-4".to_string(),
    "https://api.openai.com/v1".to_string(),
    std::env::var("OPENAI_API_KEY")?,
);

Benefits

Same API, works with any OpenAI-compatible server
Can switch between local and cloud servers easily
No code changes needed when switching providers

Advanced Usage

Custom Parameters

use radium_abstraction::ModelParameters;

let params = ModelParameters {
    temperature: Some(0.7),
    top_p: Some(0.9),
    max_tokens: Some(100),
    stop_sequences: Some(vec!["\n\n".to_string()]),
};

let response = model.generate_chat_completion(&messages, Some(params)).await?;

Multiple Messages

let messages = vec![
    ChatMessage {
        role: "system".to_string(),
        content: "You are a helpful assistant".to_string(),
    },
    ChatMessage {
        role: "user".to_string(),
        content: "What is the weather?".to_string(),
    },
];

let response = model.generate_chat_completion(&messages, None).await?;

Performance Tips

Use streaming for long responses to see output incrementally
Batch requests when possible to reduce overhead
Choose appropriate models - smaller models are faster but less capable
Monitor server resources - local servers are limited by your hardware
Use connection pooling - the HTTP client reuses connections automatically

Security Considerations

API Keys: Never commit API keys to version control
Local Servers: Local servers may not have the same security as cloud providers
Network: Use HTTPS in production, HTTP is acceptable for localhost
Authentication: Enable authentication on production servers

Examples

See the integration tests in crates/radium-models/tests/universal_integration_test.rs for complete examples with real servers.

Introduction​

Quick Start​

Supported Servers​

Server Setup Guides​

vLLM​

Installation​

Starting the Server​

Usage with Radium​

API Endpoint​

LocalAI​

Installation (Docker)​

Configuration​

Usage with Radium​

API Endpoint​

LM Studio​

Installation​

Enabling the Local Server​

Usage with Radium​

API Endpoint​

Ollama​

Installation​

Pulling a Model​

Usage with Radium​

API Endpoint​

Constructor Patterns​

1. new() - Environment Variable Authentication​

2. with_api_key() - Explicit API Key​

3. without_auth() - No Authentication​

Environment Variables​

Factory Integration​

Streaming Support​

Troubleshooting​

Connection Refused Errors​

Authentication Failures​

Timeout Issues​

Malformed Response Errors​

Empty or Missing Content​

Common Configuration Mistakes​

Migration from OpenAI Provider​

Before (OpenAI)​

After (Universal with OpenAI)​

Benefits​

Advanced Usage​

Custom Parameters​

Multiple Messages​

Performance Tips​

Security Considerations​

Examples​

Further Reading​

Introduction

Quick Start

Supported Servers

Server Setup Guides

vLLM

Installation

Starting the Server

Usage with Radium

API Endpoint

LocalAI

Installation (Docker)

Configuration

Usage with Radium

API Endpoint

LM Studio

Installation

Enabling the Local Server

Usage with Radium

API Endpoint

Ollama

Installation

Pulling a Model

Usage with Radium

API Endpoint

Constructor Patterns

1. `new()` - Environment Variable Authentication

2. `with_api_key()` - Explicit API Key

3. `without_auth()` - No Authentication

Environment Variables

Factory Integration

Streaming Support

Troubleshooting

Connection Refused Errors

Authentication Failures

Timeout Issues

Malformed Response Errors

Empty or Missing Content

Common Configuration Mistakes

Migration from OpenAI Provider

Before (OpenAI)

After (Universal with OpenAI)

Benefits

Advanced Usage

Custom Parameters

Multiple Messages

Performance Tips

Security Considerations

Examples

Further Reading