LLM Providers¶

MCPOmni Connect uses LiteLLM to provide unified access to 100+ AI models across all major providers. This page covers configuration for each supported provider.

Supported Providers¶

Provider	Models	API Key Required	Local/Remote
OpenAI	GPT-4, GPT-3.5, etc.	Yes	Remote
Anthropic	Claude 3.5, Claude 3	Yes	Remote
Google	Gemini Pro, Flash	Yes	Remote
Groq	Llama, Mixtral, Gemma	Yes	Remote
DeepSeek	DeepSeek-V3, Coder	Yes	Remote
Azure OpenAI	GPT models	Yes	Remote
OpenRouter	200+ models	Yes	Remote
Ollama	Local models	No	Local

OpenAI¶

Configuration¶

{
    "LLM": {
        "provider": "openai",
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 128000,
        "top_p": 0.9
    }
}

Available Models¶

Model	Context Length	Use Case
`gpt-4o`	128K	Most capable, latest
`gpt-4o-mini`	128K	Fast, cost-effective
`gpt-4-turbo`	128K	High performance
`gpt-4`	8K	Standard GPT-4
`gpt-3.5-turbo`	16K	Fast, affordable

Environment Setup¶

.env

LLM_API_KEY=sk-your-openai-api-key-here

Advanced Configuration¶

{
    "LLM": {
        "provider": "openai",
        "model": "gpt-4o",
        "temperature": 0.3,
        "max_tokens": 8000,
        "max_context_length": 128000,
        "top_p": 0.8,
        "frequency_penalty": 0.1,
        "presence_penalty": 0.1,
        "stop": ["</end>"]
    }
}

Anthropic (Claude)¶

Configuration¶

{
    "LLM": {
        "provider": "anthropic",
        "model": "claude-3-5-sonnet-20241022",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 200000,
        "top_p": 0.95
    }
}

Available Models¶

Model	Context Length	Strengths
`claude-3-5-sonnet-20241022`	200K	Best overall, coding
`claude-3-5-haiku-20241022`	200K	Fast, efficient
`claude-3-opus-20240229`	200K	Most capable (legacy)
`claude-3-sonnet-20240229`	200K	Balanced (legacy)
`claude-3-haiku-20240307`	200K	Fast (legacy)

Environment Setup¶

.env

LLM_API_KEY=sk-ant-your-anthropic-api-key-here

Example: Code Analysis Setup¶

{
    "LLM": {
        "provider": "anthropic",
        "model": "claude-3-5-sonnet-20241022",
        "temperature": 0.1,
        "max_tokens": 8000,
        "max_context_length": 200000,
        "top_p": 0.9
    }
}

Google (Gemini)¶

Configuration¶

{
    "LLM": {
        "provider": "google",
        "model": "gemini-1.5-pro",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 1000000,
        "top_p": 0.9
    }
}

Available Models¶

Model	Context Length	Strengths
`gemini-1.5-pro`	1M	Largest context window
`gemini-1.5-flash`	1M	Fast, efficient
`gemini-pro`	32K	Standard model

Environment Setup¶

.env

LLM_API_KEY=your-google-api-key-here

Long Context Configuration¶

{
    "LLM": {
        "provider": "google",
        "model": "gemini-1.5-pro",
        "temperature": 0.3,
        "max_tokens": 8000,
        "max_context_length": 1000000,
        "top_p": 0.8
    }
}

Groq (Fast Inference)¶

Configuration¶

{
    "LLM": {
        "provider": "groq",
        "model": "llama-3.1-8b-instant",
        "temperature": 0.5,
        "max_tokens": 2000,
        "max_context_length": 8000,
        "top_p": 0.9
    }
}

Available Models¶

Model	Context Length	Speed
`llama-3.1-8b-instant`	8K	Very Fast
`llama-3.1-70b-versatile`	8K	Fast
`mixtral-8x7b-32768`	32K	Fast
`gemma-7b-it`	8K	Fast

Environment Setup¶

.env

LLM_API_KEY=gsk_your-groq-api-key-here

High-Speed Configuration¶

{
    "LLM": {
        "provider": "groq",
        "model": "llama-3.1-8b-instant",
        "temperature": 0.1,
        "max_tokens": 1000,
        "max_context_length": 8000,
        "top_p": 0.8
    }
}

DeepSeek¶

Configuration¶

{
    "LLM": {
        "provider": "deepseek",
        "model": "deepseek-chat",
        "temperature": 0.5,
        "max_tokens": 4000,
        "max_context_length": 32000,
        "top_p": 0.8
    }
}

Available Models¶

Model	Context Length	Specialization
`deepseek-chat`	32K	General chat
`deepseek-coder`	32K	Code generation
`deepseek-reasoner`	32K	Reasoning tasks

Environment Setup¶

.env

LLM_API_KEY=sk-your-deepseek-api-key-here

Azure OpenAI¶

Configuration¶

{
    "LLM": {
        "provider": "azure",
        "model": "gpt-4",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 8000,
        "top_p": 0.9,
        "azure_endpoint": "https://your-resource.openai.azure.com",
        "azure_api_version": "2024-02-01",
        "azure_deployment": "your-deployment-name"
    }
}

Environment Setup¶

.env

LLM_API_KEY=your-azure-openai-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01

Enterprise Configuration¶

{
    "LLM": {
        "provider": "azure",
        "model": "gpt-4-turbo",
        "temperature": 0.3,
        "max_tokens": 8000,
        "max_context_length": 128000,
        "top_p": 0.8,
        "azure_endpoint": "${AZURE_OPENAI_ENDPOINT}",
        "azure_api_version": "${AZURE_OPENAI_API_VERSION}",
        "azure_deployment": "gpt-4-turbo-deployment"
    }
}

OpenRouter¶

Access to 200+ models through a single API.

Configuration¶

{
    "LLM": {
        "provider": "openrouter",
        "model": "anthropic/claude-3.5-sonnet",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 200000,
        "top_p": 0.95
    }
}

Popular Models¶

Model	Provider	Strengths
`anthropic/claude-3.5-sonnet`	Anthropic	Best overall
`openai/gpt-4o`	OpenAI	Latest GPT-4
`google/gemini-pro-1.5`	Google	Large context
`meta-llama/llama-3.1-8b-instruct`	Meta	Open source
`mistralai/mixtral-8x7b-instruct`	Mistral	Efficient

Environment Setup¶

.env

LLM_API_KEY=sk-or-your-openrouter-api-key-here

Ollama (Local Models)¶

Run models locally for privacy and offline usage.

Configuration¶

{
    "LLM": {
        "provider": "ollama",
        "model": "llama3.1:8b",
        "temperature": 0.7,
        "max_tokens": 4000,
        "max_context_length": 8000,
        "top_p": 0.9,
        "ollama_host": "http://localhost:11434"
    }
}

Popular Local Models¶

Model	Size	Use Case
`llama3.1:8b`	4.7GB	General purpose
`llama3.1:13b`	7.3GB	Better quality
`codellama:7b`	3.8GB	Code generation
`mistral:7b`	4.1GB	Efficient
`qwen2:7b`	4.4GB	Multilingual

Setup Ollama¶

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.1:8b

# Start Ollama service
ollama serve

No API Key Required¶

{
    "LLM": {
        "provider": "ollama",
        "model": "llama3.1:8b",
        "temperature": 0.5,
        "max_tokens": 2000,
        "ollama_host": "http://localhost:11434"
    }
}

Configuration Parameters¶

Common Parameters¶

Parameter	Description	Typical Range	Default
`temperature`	Response creativity	0.0 - 2.0	0.7
`max_tokens`	Response length limit	1 - 8000	1000
`top_p`	Nucleus sampling	0.1 - 1.0	1.0
`frequency_penalty`	Repetition penalty	-2.0 - 2.0	0.0
`presence_penalty`	Topic diversity	-2.0 - 2.0	0.0

Provider-Specific Parameters¶

OpenAIAnthropicGoogle

{
    "stop": ["</end>", "\n\n"],
    "logit_bias": {"-1": -100},
    "user": "user-123"
}

{
    "top_k": 40,
    "stop_sequences": ["</thinking>"]
}

{
    "candidate_count": 1,
    "safety_settings": []
}

Model Selection Guide¶

By Use Case¶

Use Case	Recommended Models
General Chat	GPT-4o-mini, Claude 3.5 Sonnet
Code Generation	Claude 3.5 Sonnet, DeepSeek Coder
Long Documents	Gemini 1.5 Pro, Claude 3.5 Sonnet
Fast Responses	Groq Llama 3.1, GPT-3.5 Turbo
Cost Effective	GPT-4o-mini, Groq models
Privacy/Local	Ollama Llama 3.1, Mistral

By Performance¶

Priority	Models	Trade-offs
Quality	Claude 3.5 Sonnet, GPT-4o	Higher cost
Speed	Groq models, GPT-3.5	Lower accuracy
Context	Gemini 1.5 Pro	Google ecosystem
Cost	GPT-4o-mini, DeepSeek	Some capability limits

Switching Between Providers¶

You can switch providers dynamically:

# Update configuration and restart
vim servers_config.json

# Or use environment variables
export LLM_PROVIDER=anthropic
export LLM_MODEL=claude-3-5-sonnet-20241022

Cost Optimization¶

Token Usage Monitoring¶

# Check current usage
/api_stats

# Set usage limits in configuration
{
    "AgentConfig": {
        "total_tokens_limit": 50000,
        "request_limit": 1000
    }
}

Cost-Effective Models¶

{
    "LLM": {
        "provider": "openai",
        "model": "gpt-4o-mini",  // Most cost-effective GPT-4 class
        "max_tokens": 1000,     // Limit response length
        "temperature": 0.3      // More focused responses
    }
}

Troubleshooting¶

Common Issues¶

Invalid API Key

Error: Authentication failed

Solutions: - Verify API key in .env file - Check key has proper permissions - Ensure key is for correct provider

Model Not Found

Error: Model not available

Solutions: - Check model name spelling - Verify model availability for your account - Try alternative model

Rate Limit

Error: Rate limit exceeded

Solutions: - Reduce request frequency - Upgrade API plan - Switch to different provider

Testing Configuration¶

# Test with simple query
> Hello, can you respond?

# Check model info
/api_stats

# Enable debug for detailed logs
/debug

Next: Troubleshooting →