Universal OpenAI-Compatible Provider Guide
Introductionβ
The Universal OpenAI-Compatible Provider enables Radium to connect to any server that implements the OpenAI Chat Completions API specification. This includes popular local inference servers like vLLM, LocalAI, LM Studio, and Ollama, allowing you to:
- Run models locally for privacy and cost savings
- Use self-hosted inference servers
- Experiment with open-source models
- Avoid vendor lock-in
Quick Startβ
The simplest way to get started is with a local server that doesn't require authentication:
use radium_models::UniversalModel;
// Connect to LM Studio (no authentication required)
let model = UniversalModel::without_auth(
"llama-2-7b".to_string(),
"http://localhost:1234/v1".to_string(),
);
// Generate text
let response = model.generate_text("Say hello", None).await?;
println!("{}", response.content);
Supported Serversβ
The Universal provider works with any server implementing the OpenAI Chat Completions API, including:
- vLLM: High-performance LLM inference server
- LocalAI: Local inference server with OpenAI-compatible API
- LM Studio: Desktop app for running local models
- Ollama: Local model runner with OpenAI-compatible endpoints
- Any custom server: As long as it implements the OpenAI API spec
Server Setup Guidesβ
vLLMβ
vLLM is a high-performance inference server optimized for large language models.
Installationβ
pip install vllm
Starting the Serverβ
# Serve a model (replace with your model name)
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000
# With API key authentication (optional)
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000 --api-key your-api-key
Usage with Radiumβ
use radium_models::UniversalModel;
// Without authentication
let model = UniversalModel::without_auth(
"meta-llama/Llama-3-8B-Instruct".to_string(),
"http://localhost:8000/v1".to_string(),
);
// With authentication
let model = UniversalModel::with_api_key(
"meta-llama/Llama-3-8B-Instruct".to_string(),
"http://localhost:8000/v1".to_string(),
"your-api-key".to_string(),
);
API Endpointβ
- Default:
http://localhost:8000/v1 - Chat completions:
http://localhost:8000/v1/chat/completions
LocalAIβ
LocalAI is a drop-in replacement for OpenAI that runs locally using consumer-grade hardware.
Installation (Docker)β
docker run -p 8080:8080 localai/localai
Configurationβ
LocalAI uses YAML configuration files. Create a models.yaml file:
models:
- name: gpt-3.5-turbo
backend: llama
parameters:
model: /path/to/model.gguf
Usage with Radiumβ
use radium_models::UniversalModel;
// Without authentication (default)
let model = UniversalModel::without_auth(
"gpt-3.5-turbo".to_string(),
"http://localhost:8080/v1".to_string(),
);
// With authentication (if configured)
let model = UniversalModel::with_api_key(
"gpt-3.5-turbo".to_string(),
"http://localhost:8080/v1".to_string(),
"local-api-key".to_string(),
);
API Endpointβ
- Default:
http://localhost:8080/v1 - Chat completions:
http://localhost:8080/v1/chat/completions
LM Studioβ
LM Studio is a user-friendly desktop application for running local models.
Installationβ
- Download LM Studio from lmstudio.ai
- Install and launch the application
- Download a model through the UI
Enabling the Local Serverβ
- Open LM Studio
- Go to Settings β Local Server
- Enable "Local Server"
- Note the port (default: 1234)
Usage with Radiumβ
use radium_models::UniversalModel;
// LM Studio doesn't require authentication
let model = UniversalModel::without_auth(
"llama-2-7b".to_string(), // Use the model name from LM Studio
"http://localhost:1234/v1".to_string(),
);
let response = model.generate_text("Say hello", None).await?;
API Endpointβ
- Default:
http://localhost:1234/v1 - Chat completions:
http://localhost:1234/v1/chat/completions
Ollamaβ
Ollama is a simple tool for running large language models locally.
Installationβ
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.com
Pulling a Modelβ
ollama pull llama2
Usage with Radiumβ
use radium_models::UniversalModel;
// Ollama uses OpenAI-compatible endpoints
let model = UniversalModel::without_auth(
"llama2".to_string(),
"http://localhost:11434/v1".to_string(),
);
API Endpointβ
- Default:
http://localhost:11434/v1 - Chat completions:
http://localhost:11434/v1/chat/completions
Constructor Patternsβ
The Universal provider supports three constructor patterns:
1. new() - Environment Variable Authenticationβ
Loads API key from environment variables:
// Set environment variable
std::env::set_var("UNIVERSAL_API_KEY", "your-api-key");
// Or use OPENAI_COMPATIBLE_API_KEY
std::env::set_var("OPENAI_COMPATIBLE_API_KEY", "your-api-key");
let model = UniversalModel::new(
"model-name".to_string(),
"http://localhost:8000/v1".to_string(),
)?;
2. with_api_key() - Explicit API Keyβ
Provides API key directly:
let model = UniversalModel::with_api_key(
"model-name".to_string(),
"http://localhost:8000/v1".to_string(),
"your-api-key".to_string(),
);
3. without_auth() - No Authenticationβ
For servers that don't require authentication:
let model = UniversalModel::without_auth(
"model-name".to_string(),
"http://localhost:1234/v1".to_string(),
);
Environment Variablesβ
The Universal provider supports the following environment variables:
UNIVERSAL_API_KEY: Primary API key environment variableOPENAI_COMPATIBLE_API_KEY: Fallback API key environment variableUNIVERSAL_BASE_URL: Default base URL (not currently used, specify in constructor)
Factory Integrationβ
You can also create Universal models through the ModelFactory:
use radium_models::{ModelConfig, ModelFactory, ModelType};
let config = ModelConfig::new(
ModelType::Universal,
"model-name".to_string(),
)
.with_base_url("http://localhost:8000/v1".to_string())
.with_api_key("your-api-key".to_string()); // Optional
let model = ModelFactory::create(config)?;
Streaming Supportβ
The Universal provider supports Server-Sent Events (SSE) streaming:
use futures::StreamExt;
use radium_models::UniversalModel;
use radium_abstraction::ChatMessage;
let model = UniversalModel::without_auth(
"model-name".to_string(),
"http://localhost:8000/v1".to_string(),
);
let messages = vec![ChatMessage {
role: "user".to_string(),
content: "Tell me a story".to_string(),
}];
let mut stream = model.generate_chat_completion_stream(&messages, None).await?;
while let Some(result) = stream.next().await {
let content = result?;
print!("{}", content);
}
Troubleshootingβ
Connection Refused Errorsβ
Error: Network error: Connection refused
Solutions:
- Verify the server is running:
curl http://localhost:8000/v1/models - Check the port number matches your server configuration
- Ensure the base URL includes
/v1suffix - Check firewall settings
Authentication Failuresβ
Error: Authentication failed (401)
Solutions:
- Verify your API key is correct
- Check if the server requires authentication (some don't)
- Use
without_auth()for servers that don't require keys - Check server logs for authentication errors
Timeout Issuesβ
Error: Request timeout
Solutions:
- The default timeout is 60 seconds
- Large models or slow hardware may need more time
- Check server logs for processing delays
- Consider using a faster model or hardware
Malformed Response Errorsβ
Error: Failed to parse response
Solutions:
- Verify the server implements the OpenAI API specification correctly
- Check server logs for error responses
- Test the server directly with
curl:curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"test","messages":[{"role":"user","content":"hello"}]}'
Empty or Missing Contentβ
Error: No content in API response
Solutions:
- Check that the model is loaded and ready
- Verify the model name matches what the server expects
- Check server logs for generation errors
- Try a simpler prompt to test basic functionality
Common Configuration Mistakesβ
- Missing
/v1suffix: Base URL should behttp://localhost:8000/v1, nothttp://localhost:8000 - Wrong model name: Use the exact model identifier the server expects
- Port mismatch: Verify the port matches your server configuration
- Authentication when not needed: Some servers (LM Studio, Ollama) don't require API keys
Migration from OpenAI Providerβ
If you're currently using the OpenAI provider and want to switch to Universal:
Before (OpenAI)β
use radium_models::OpenAIModel;
let model = OpenAIModel::new("gpt-4".to_string())?;
After (Universal with OpenAI)β
use radium_models::UniversalModel;
let model = UniversalModel::with_api_key(
"gpt-4".to_string(),
"https://api.openai.com/v1".to_string(),
std::env::var("OPENAI_API_KEY")?,
);
Benefitsβ
- Same API, works with any OpenAI-compatible server
- Can switch between local and cloud servers easily
- No code changes needed when switching providers
Advanced Usageβ
Custom Parametersβ
use radium_abstraction::ModelParameters;
let params = ModelParameters {
temperature: Some(0.7),
top_p: Some(0.9),
max_tokens: Some(100),
stop_sequences: Some(vec!["\n\n".to_string()]),
};
let response = model.generate_chat_completion(&messages, Some(params)).await?;
Multiple Messagesβ
let messages = vec![
ChatMessage {
role: "system".to_string(),
content: "You are a helpful assistant".to_string(),
},
ChatMessage {
role: "user".to_string(),
content: "What is the weather?".to_string(),
},
];
let response = model.generate_chat_completion(&messages, None).await?;
Performance Tipsβ
- Use streaming for long responses to see output incrementally
- Batch requests when possible to reduce overhead
- Choose appropriate models - smaller models are faster but less capable
- Monitor server resources - local servers are limited by your hardware
- Use connection pooling - the HTTP client reuses connections automatically
Security Considerationsβ
- API Keys: Never commit API keys to version control
- Local Servers: Local servers may not have the same security as cloud providers
- Network: Use HTTPS in production, HTTP is acceptable for localhost
- Authentication: Enable authentication on production servers
Examplesβ
See the integration tests in crates/radium-models/tests/universal_integration_test.rs for complete examples with real servers.