Β© 2026 Muvon Un Limited (Hong Kong) | Website | Product Page
Octolib is a comprehensive, self-sufficient AI provider library that provides a unified, type-safe interface for interacting with multiple AI services. It offers intelligent model selection, robust error handling, and advanced features like cross-provider tool calling and vision support.
- π Multi-Provider Support: OpenAI, Anthropic, OpenRouter, Cerebras, NVIDIA NIM, Groq, BytePlus, Ollama, Together, Featherless, Google, Amazon, Cloudflare, DeepSeek, MiniMax, Moonshot AI (Kimi), Z.ai, OctoHub, Local, CLI proxies
- π‘οΈ Unified Interface: Consistent API across different providers
- π Intelligent Model Validation: Strict
provider:modelformat parsing with case-insensitive model support - π Structured Output: JSON and JSON Schema support for OpenAI, OpenRouter, DeepSeek, Together, and Z.ai
- π° Cost Tracking: Automatic token usage and cost calculation
- πΌοΈ Vision Support: Image and video attachment handling for compatible models (Moonshot Kimi K2.5)
- π§° Tool Calling: Cross-provider tool call standardization
- π§© CLI Provider: Use
cli:<backend>/<model>(e.g.cli:codex/gpt-5.2-codex). Proxy-only: tools/MCP are not used or controllable. - β±οΈ Retry Management: Configurable exponential backoff
- π Secure Design: Environment-based API key management
- π― Embedding Support: Multi-provider embedding generation with Jina, Voyage, Google, OpenAI, Together, OctoHub, FastEmbed, and HuggingFace
- π Reranking: Document relevance scoring with cross-encoder models (Voyage AI, Cohere, Jina AI, Mixedbread, HuggingFace)
# Add to Cargo.toml
octolib = { git = "https://github.com/muvon/octolib" }use octolib::{ProviderFactory, ChatCompletionParams, Message};
async fn example() -> anyhow::Result<()> {
// Parse model and get provider
let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;
// Create messages
let messages = vec![
Message::user("Hello, how are you?"),
];
// Create completion parameters
let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);
// Get completion (requires OPENAI_API_KEY environment variable)
let response = provider.chat_completion(params).await?;
println!("Response: {}", response.content);
Ok(())
}Get structured JSON responses with schema validation:
use octolib::{ProviderFactory, ChatCompletionParams, Message, StructuredOutputRequest};
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug)]
struct PersonInfo {
name: String,
age: u32,
skills: Vec<String>,
}
async fn structured_example() -> anyhow::Result<()> {
let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;
// Check if provider supports structured output
if !provider.supports_structured_output(&model) {
return Err(anyhow::anyhow!("Provider does not support structured output"));
}
let messages = vec![
Message::user("Tell me about a software engineer in JSON format"),
];
// Request structured JSON output
let structured_request = StructuredOutputRequest::json();
let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
.with_structured_output(structured_request);
let response = provider.chat_completion(params).await?;
if let Some(structured) = response.structured_output {
let person: PersonInfo = serde_json::from_value(structured)?;
println!("Person: {:?}", person);
}
Ok(())
}Use local CLIs as a lightweight proxy. This mode is prompt-only; tool calling/MCP integration is not used or controllable.
let (provider, model) = ProviderFactory::get_provider_for_model(\"cli:codex/gpt-5.2-codex\")?;
// or: \"cli:claude/claude-sonnet-4-5\"
// or: \"cli:gemini/gemini-2.5-pro\"
// or: \"cli:cursor/auto\"Set a backend-specific command if it is not on PATH:
CLI_CODEX_COMMAND=/path/to/codex
CLI_CLAUDE_COMMAND=/path/to/claude
CLI_GEMINI_COMMAND=/path/to/gemini
CLI_CURSOR_COMMAND=/path/to/cursor-agent
Use AI models to call functions with automatic parameter extraction:
use octolib::{ProviderFactory, ChatCompletionParams, Message, FunctionDefinition, ToolCall};
use serde_json::json;
async fn tool_calling_example() -> anyhow::Result<()> {
let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;
// Define available tools/functions
let tools = vec![
FunctionDefinition {
name: "get_weather".to_string(),
description: "Get the current weather for a location".to_string(),
parameters: json!({
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}),
cache_control: None,
},
FunctionDefinition {
name: "calculate".to_string(),
description: "Perform a mathematical calculation".to_string(),
parameters: json!({
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate"
}
},
"required": ["expression"]
}),
cache_control: None,
},
];
let mut messages = vec![
Message::user("What's the weather in Tokyo and calculate 15 * 23?"),
];
// Initial request with tools
let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
.with_tools(tools.clone());
let response = provider.chat_completion(params).await?;
// Check if model wants to call tools
if let Some(tool_calls) = response.tool_calls {
println!("Model requested {} tool calls", tool_calls.len());
// Add assistant's response with tool calls to conversation
let mut assistant_msg = Message::assistant(&response.content);
assistant_msg.tool_calls = Some(serde_json::to_value(&tool_calls)?);
messages.push(assistant_msg);
// Execute each tool call and add results
for tool_call in tool_calls {
println!("Calling tool: {} with args: {}", tool_call.name, tool_call.arguments);
// Execute the tool (your implementation)
let result = match tool_call.name.as_str() {
"get_weather" => {
let location = tool_call.arguments["location"].as_str().unwrap_or("Unknown");
json!({
"location": location,
"temperature": 22,
"unit": "celsius",
"condition": "sunny"
})
}
"calculate" => {
let expr = tool_call.arguments["expression"].as_str().unwrap_or("0");
// Simple calculation (in real app, use proper eval)
json!({
"expression": expr,
"result": 345 // 15 * 23
})
}
_ => json!({"error": "Unknown tool"}),
};
// Add tool result to conversation
messages.push(Message::tool(
&serde_json::to_string(&result)?,
&tool_call.id,
&tool_call.name,
));
}
// Get final response with tool results
let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
.with_tools(tools);
let final_response = provider.chat_completion(params).await?;
println!("Final response: {}", final_response.content);
} else {
println!("Direct response: {}", response.content);
}
Ok(())
}Tool Calling Features:
- β Cross-provider support (OpenAI, Anthropic, Google, Amazon, OpenRouter)
- β Automatic parameter validation via JSON Schema
- β Multi-turn conversations with tool results
- β Parallel tool execution support
- β
Standardized
ToolCallandGenericToolCallformats across all providers - β Provider-specific metadata preservation (e.g., Gemini thought signatures)
- β
Clean conversion API with
to_generic_tool_calls()method
Generate embeddings using multiple providers:
use octolib::embedding::{generate_embeddings, generate_embeddings_batch, InputType};
async fn embedding_example() -> anyhow::Result<()> {
// Single embedding generation
let embedding = generate_embeddings(
"Hello, world!",
"voyage", // provider
"voyage-3.5-lite" // model
).await?;
println!("Embedding dimension: {}", embedding.len());
// Batch embedding generation
let texts = vec![
"First document".to_string(),
"Second document".to_string(),
];
let embeddings = generate_embeddings_batch(
texts,
"jina", // provider
"jina-embeddings-v4", // model
InputType::Document, // input type for better embeddings
16, // batch size
100_000, // max tokens per batch
).await?;
println!("Generated {} embeddings", embeddings.len());
Ok(())
}
// Supported embedding providers:
// - Jina: jina-embeddings-v4, jina-clip-v2, etc.
// - Voyage: voyage-3.5, voyage-code-2, etc.
// - Google: gemini-embedding-001, text-embedding-005
// - OpenAI: text-embedding-3-small, text-embedding-3-large
// - FastEmbed: Local models (feature-gated)
// - HuggingFace: sentence-transformers modelsImprove search results by scoring document relevance with cross-encoder models:
use octolib::reranker::rerank;
async fn reranking_example() -> anyhow::Result<()> {
let query = "What is machine learning?";
let documents = vec![
"Machine learning is a subset of AI.".to_string(),
"Cooking recipes for beginners.".to_string(),
"Deep learning uses neural networks.".to_string(),
];
// Rerank documents by relevance to query
let response = rerank(
query,
documents,
"voyage", // provider: voyage, cohere, jina, fastembed
"rerank-2.5", // model
Some(2) // top_k: return top 2 results
).await?;
for (rank, result) in response.results.iter().enumerate() {
println!("Rank {}: Score {:.4}", rank + 1, result.relevance_score);
println!(" Document: {}", result.document);
}
println!("Total tokens used: {}", response.total_tokens);
Ok(())
}
// Supported Providers:
//
// API-Based (require API keys):
// - Voyage AI (VOYAGE_API_KEY): rerank-2.5, rerank-2.5-lite, rerank-2, rerank-2-lite
// - Cohere (COHERE_API_KEY): rerank-english-v3.0, rerank-multilingual-v3.0
// - Jina AI (JINA_API_KEY): jina-reranker-v3, jina-reranker-v2-base-multilingual
//
// Local (no API keys, requires features):
// - FastEmbed (fastembed feature): bge-reranker-base, bge-reranker-large, jina-reranker-v1-turbo-enOctolib supports OAuth authentication for ChatGPT subscriptions and Anthropic:
OpenAI OAuth (ChatGPT Plus/Pro/Team/Enterprise):
export OPENAI_OAUTH_ACCESS_TOKEN="your_oauth_token"
export OPENAI_OAUTH_ACCOUNT_ID="your_account_id"Anthropic OAuth:
export ANTHROPIC_OAUTH_TOKEN="your_bearer_token"The library automatically detects OAuth credentials and prefers them over API keys. See examples/openai_oauth.rs and examples/anthropic_oauth.rs for full usage examples.
| Provider | Structured Output | Vision | Tool Calls | Caching |
|---|---|---|---|---|
| OpenAI | β JSON + Schema | β Yes | β Yes | β Yes |
| OpenRouter | β JSON + Schema | β Yes | β Yes | β Yes |
| DeepSeek | β JSON Mode | β No | β No | β Yes |
| Moonshot AI (Kimi) | β JSON Mode | β kimi-k2.5 | β Yes | β Yes |
| MiniMax | β JSON Mode | β No | β Yes | β Yes |
| Anthropic | β No | β Yes | β Yes | β Yes |
| Z.ai | β JSON Mode | β No | β Yes | β Yes |
| NVIDIA NIM | β JSON + Schema | Per-model | β Yes | β No |
| Groq | β JSON + Schema | Per-model | β No | β Select models |
| BytePlus | β JSON + Schema | Per-model | β No | β Yes |
| Cerebras | β JSON + Schema | β No | β No | β No |
| Featherless | β JSON + Schema | β No | β No | β No |
| Google Vertex | β No | β Yes | β Yes | β No |
| Amazon Bedrock | β No | β Yes | β Yes | β No |
| OctoHub | Per-model | Per-model | β Yes | β Yes |
| Together | Per-model | Per-model | β Yes | β No |
| Cloudflare | β No | β No | β No | β No |
| Local | Per-model | Per-model | Per-model | β No |
| Ollama | Per-model | Per-model | Per-model | β No |
- JSON Mode: Basic JSON object output
- JSON Schema: Full schema validation with strict mode
- Provider Detection: Use
provider.supports_structured_output(&model)to check capability
Octolib provides first-class support for models that produce thinking/reasoning content. Thinking is stored separately from the main response content, similar to how tool_calls are separate from content.
use octolib::{ProviderFactory, ChatCompletionParams, Message, ThinkingBlock};
async fn thinking_example() -> anyhow::Result<()> {
// MiniMax and OpenAI o-series models support thinking
let (provider, model) = ProviderFactory::get_provider_for_model("minimax:MiniMax-M2")?;
let messages = vec![
Message::user("Solve this complex math problem step by step"),
];
let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);
let response = provider.chat_completion(params).await?;
// Access thinking content (separate from response.content)
if let Some(ref thinking) = response.thinking {
println!("=== MODEL THINKING ({}) ===", thinking.tokens);
println!("{}", thinking.content);
println!("==========================");
}
// Final response (clean, no thinking prefix)
println!("Response: {}", response.content);
// Token usage breakdown
if let Some(usage) = &response.exchange.usage {
println!("Input tokens: {}", usage.input_tokens);
println!("Cache read tokens: {}", usage.cache_read_tokens);
println!("Cache write tokens: {}", usage.cache_write_tokens);
println!("Output tokens: {}", usage.output_tokens);
println!("Reasoning tokens: {}", usage.reasoning_tokens);
}
Ok(())
}| Provider | Thinking Format | Notes |
|---|---|---|
| MiniMax | Content blocks ({"type": "thinking"}) |
Full thinking block extraction |
| OpenAI o-series | reasoning_content field |
o1, o3, o4 models |
| OpenRouter | reasoning_details |
Gemini and other providers |
Thinking tokens are tracked separately in TokenUsage.reasoning_tokens:
if let Some(usage) = &response.exchange.usage {
println!("Total tokens: {}", usage.total_tokens);
println!(" - Input: {}", usage.input_tokens);
println!(" - Cache Read: {}", usage.cache_read_tokens);
println!(" - Cache Write: {}", usage.cache_write_tokens);
println!(" - Output: {}", usage.output_tokens);
println!(" - Reasoning: {}", usage.reasoning_tokens);
}π Quick Navigation
- Overview - Library introduction and core concepts
- Installation Guide - Setup and configuration
- Advanced Usage - Advanced features and customization
- Advanced Guide - Comprehensive usage patterns
- Embedding Guide - Embedding generation with multiple providers
- Reranking Guide - Document relevance scoring
- Tool Calling - Cross-provider tool calling
- Thinking/Reasoning - Reasoning model support
| Provider | Status | Capabilities |
|---|---|---|
| OpenAI | β Full Support | Chat, Vision, Tools, Structured Output, Caching |
| Anthropic | β Full Support | Claude Models, Vision, Tools, Caching |
| OpenRouter | β Full Support | Multi-Provider Proxy, Vision, Caching, Structured Output |
| Groq | β Full Support | Fast Inference, Structured Output, Caching |
| BytePlus | β Full Support | Seed Models, Structured Output, Caching |
| DeepSeek | β Full Support | Open-Source AI Models, Structured Output, Caching |
| Moonshot AI (Kimi) | β Full Support | Kimi K2 Series, Vision (kimi-k2.5), Tools, Structured Output, Caching, Thinking |
| MiniMax | β Full Support | Anthropic-Compatible API, Tools, Caching, Thinking, Structured Output |
| Z.ai | β Full Support | GLM Models, Caching, Structured Output |
| NVIDIA NIM | β Full Support | 100+ Hosted Models, Tools, Structured Output, Reference Pricing |
| Together AI | β Full Support | Multi-Provider Proxy, Vision, Tools, Structured Output |
| Cerebras | β Full Support | Fast Inference, Structured Output |
| Featherless | β Full Support | Open-Weight Models (Qwen, Llama, Mistral, DeepSeek, RWKV), Subscription Billing |
| OctoHub | β Supported | Local AI Serving |
| Google Vertex AI | β Supported | Enterprise AI Integration |
| Amazon Bedrock | β Supported | Cloud AI Services |
| Cloudflare Workers AI | β Supported | Edge AI Compute |
| Local LLM | β Supported | Ollama, LM Studio, LocalAI, Jan, vLLM |
| Ollama | β Supported | Local LLM Runner |
| CLI Proxy | β Supported | Codex, Claude, Gemini, Cursor |
- π Local-first design
- π Secure API key management
- π Respects .gitignore
- π‘οΈ Comprehensive error handling
- π Issues: GitHub Issues
- π§ Email: opensource@muvon.io
- π’ Company: Muvon Un Limited (Hong Kong)
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with β€οΈ by the Muvon team in Hong Kong