Skip to content

Muvon/octolib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

248 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Octolib: Self-Sufficient AI Provider Library

Β© 2026 Muvon Un Limited (Hong Kong) | Website | Product Page License Rust

πŸš€ Overview

Octolib is a comprehensive, self-sufficient AI provider library that provides a unified, type-safe interface for interacting with multiple AI services. It offers intelligent model selection, robust error handling, and advanced features like cross-provider tool calling and vision support.

✨ Key Features

  • πŸ”Œ Multi-Provider Support: OpenAI, Anthropic, OpenRouter, Cerebras, NVIDIA NIM, Groq, BytePlus, Ollama, Together, Featherless, Google, Amazon, Cloudflare, DeepSeek, MiniMax, Moonshot AI (Kimi), Z.ai, OctoHub, Local, CLI proxies
  • πŸ›‘οΈ Unified Interface: Consistent API across different providers
  • πŸ” Intelligent Model Validation: Strict provider:model format parsing with case-insensitive model support
  • πŸ“‹ Structured Output: JSON and JSON Schema support for OpenAI, OpenRouter, DeepSeek, Together, and Z.ai
  • πŸ’° Cost Tracking: Automatic token usage and cost calculation
  • πŸ–ΌοΈ Vision Support: Image and video attachment handling for compatible models (Moonshot Kimi K2.5)
  • 🧰 Tool Calling: Cross-provider tool call standardization
  • 🧩 CLI Provider: Use cli:<backend>/<model> (e.g. cli:codex/gpt-5.2-codex). Proxy-only: tools/MCP are not used or controllable.
  • ⏱️ Retry Management: Configurable exponential backoff
  • πŸ”’ Secure Design: Environment-based API key management
  • 🎯 Embedding Support: Multi-provider embedding generation with Jina, Voyage, Google, OpenAI, Together, OctoHub, FastEmbed, and HuggingFace
  • πŸ”„ Reranking: Document relevance scoring with cross-encoder models (Voyage AI, Cohere, Jina AI, Mixedbread, HuggingFace)

πŸ“¦ Quick Installation

# Add to Cargo.toml
octolib = { git = "https://github.com/muvon/octolib" }

πŸš€ Quick Start

use octolib::{ProviderFactory, ChatCompletionParams, Message};

async fn example() -> anyhow::Result<()> {
    // Parse model and get provider
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Create messages
    let messages = vec![
        Message::user("Hello, how are you?"),
    ];

    // Create completion parameters
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);

    // Get completion (requires OPENAI_API_KEY environment variable)
    let response = provider.chat_completion(params).await?;
    println!("Response: {}", response.content);

    Ok(())
}

πŸ“‹ Structured Output

Get structured JSON responses with schema validation:

use octolib::{ProviderFactory, ChatCompletionParams, Message, StructuredOutputRequest};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct PersonInfo {
    name: String,
    age: u32,
    skills: Vec<String>,
}

async fn structured_example() -> anyhow::Result<()> {
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Check if provider supports structured output
    if !provider.supports_structured_output(&model) {
        return Err(anyhow::anyhow!("Provider does not support structured output"));
    }

    let messages = vec![
        Message::user("Tell me about a software engineer in JSON format"),
    ];

    // Request structured JSON output
    let structured_request = StructuredOutputRequest::json();
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
        .with_structured_output(structured_request);

    let response = provider.chat_completion(params).await?;

    if let Some(structured) = response.structured_output {
        let person: PersonInfo = serde_json::from_value(structured)?;
        println!("Person: {:?}", person);
    }

    Ok(())
}

🧩 CLI Provider (Proxy Mode)

Use local CLIs as a lightweight proxy. This mode is prompt-only; tool calling/MCP integration is not used or controllable.

let (provider, model) = ProviderFactory::get_provider_for_model(\"cli:codex/gpt-5.2-codex\")?;
// or: \"cli:claude/claude-sonnet-4-5\"
// or: \"cli:gemini/gemini-2.5-pro\"
// or: \"cli:cursor/auto\"

Set a backend-specific command if it is not on PATH:

CLI_CODEX_COMMAND=/path/to/codex
CLI_CLAUDE_COMMAND=/path/to/claude
CLI_GEMINI_COMMAND=/path/to/gemini
CLI_CURSOR_COMMAND=/path/to/cursor-agent

🧰 Tool Calling

Use AI models to call functions with automatic parameter extraction:

use octolib::{ProviderFactory, ChatCompletionParams, Message, FunctionDefinition, ToolCall};
use serde_json::json;

async fn tool_calling_example() -> anyhow::Result<()> {
    let (provider, model) = ProviderFactory::get_provider_for_model("openai:gpt-4o")?;

    // Define available tools/functions
    let tools = vec![
        FunctionDefinition {
            name: "get_weather".to_string(),
            description: "Get the current weather for a location".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }),
            cache_control: None,
        },
        FunctionDefinition {
            name: "calculate".to_string(),
            description: "Perform a mathematical calculation".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }),
            cache_control: None,
        },
    ];

    let mut messages = vec![
        Message::user("What's the weather in Tokyo and calculate 15 * 23?"),
    ];

    // Initial request with tools
    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
        .with_tools(tools.clone());

    let response = provider.chat_completion(params).await?;

    // Check if model wants to call tools
    if let Some(tool_calls) = response.tool_calls {
        println!("Model requested {} tool calls", tool_calls.len());

        // Add assistant's response with tool calls to conversation
        let mut assistant_msg = Message::assistant(&response.content);
        assistant_msg.tool_calls = Some(serde_json::to_value(&tool_calls)?);
        messages.push(assistant_msg);

        // Execute each tool call and add results
        for tool_call in tool_calls {
            println!("Calling tool: {} with args: {}", tool_call.name, tool_call.arguments);

            // Execute the tool (your implementation)
            let result = match tool_call.name.as_str() {
                "get_weather" => {
                    let location = tool_call.arguments["location"].as_str().unwrap_or("Unknown");
                    json!({
                        "location": location,
                        "temperature": 22,
                        "unit": "celsius",
                        "condition": "sunny"
                    })
                }
                "calculate" => {
                    let expr = tool_call.arguments["expression"].as_str().unwrap_or("0");
                    // Simple calculation (in real app, use proper eval)
                    json!({
                        "expression": expr,
                        "result": 345  // 15 * 23
                    })
                }
                _ => json!({"error": "Unknown tool"}),
            };

            // Add tool result to conversation
            messages.push(Message::tool(
                &serde_json::to_string(&result)?,
                &tool_call.id,
                &tool_call.name,
            ));
        }

        // Get final response with tool results
        let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000)
            .with_tools(tools);

        let final_response = provider.chat_completion(params).await?;
        println!("Final response: {}", final_response.content);
    } else {
        println!("Direct response: {}", response.content);
    }

    Ok(())
}

Tool Calling Features:

  • βœ… Cross-provider support (OpenAI, Anthropic, Google, Amazon, OpenRouter)
  • βœ… Automatic parameter validation via JSON Schema
  • βœ… Multi-turn conversations with tool results
  • βœ… Parallel tool execution support
  • βœ… Standardized ToolCall and GenericToolCall formats across all providers
  • βœ… Provider-specific metadata preservation (e.g., Gemini thought signatures)
  • βœ… Clean conversion API with to_generic_tool_calls() method

🎯 Embedding Generation

Generate embeddings using multiple providers:

use octolib::embedding::{generate_embeddings, generate_embeddings_batch, InputType};

async fn embedding_example() -> anyhow::Result<()> {
    // Single embedding generation
    let embedding = generate_embeddings(
        "Hello, world!",
        "voyage",  // provider
        "voyage-3.5-lite"  // model
    ).await?;

    println!("Embedding dimension: {}", embedding.len());

    // Batch embedding generation
    let texts = vec![
        "First document".to_string(),
        "Second document".to_string(),
    ];

    let embeddings = generate_embeddings_batch(
        texts,
        "jina",  // provider
        "jina-embeddings-v4",  // model
        InputType::Document,  // input type for better embeddings
        16,  // batch size
        100_000,  // max tokens per batch
    ).await?;

    println!("Generated {} embeddings", embeddings.len());

    Ok(())
}

// Supported embedding providers:
// - Jina: jina-embeddings-v4, jina-clip-v2, etc.
// - Voyage: voyage-3.5, voyage-code-2, etc.
// - Google: gemini-embedding-001, text-embedding-005
// - OpenAI: text-embedding-3-small, text-embedding-3-large
// - FastEmbed: Local models (feature-gated)
// - HuggingFace: sentence-transformers models

🎯 Document Reranking

Improve search results by scoring document relevance with cross-encoder models:

use octolib::reranker::rerank;

async fn reranking_example() -> anyhow::Result<()> {
    let query = "What is machine learning?";
    let documents = vec![
        "Machine learning is a subset of AI.".to_string(),
        "Cooking recipes for beginners.".to_string(),
        "Deep learning uses neural networks.".to_string(),
    ];

    // Rerank documents by relevance to query
    let response = rerank(
        query,
        documents,
        "voyage",           // provider: voyage, cohere, jina, fastembed
        "rerank-2.5",       // model
        Some(2)             // top_k: return top 2 results
    ).await?;

    for (rank, result) in response.results.iter().enumerate() {
        println!("Rank {}: Score {:.4}", rank + 1, result.relevance_score);
        println!("  Document: {}", result.document);
    }

    println!("Total tokens used: {}", response.total_tokens);

    Ok(())
}

// Supported Providers:
//
// API-Based (require API keys):
// - Voyage AI (VOYAGE_API_KEY): rerank-2.5, rerank-2.5-lite, rerank-2, rerank-2-lite
// - Cohere (COHERE_API_KEY): rerank-english-v3.0, rerank-multilingual-v3.0
// - Jina AI (JINA_API_KEY): jina-reranker-v3, jina-reranker-v2-base-multilingual
//
// Local (no API keys, requires features):
// - FastEmbed (fastembed feature): bge-reranker-base, bge-reranker-large, jina-reranker-v1-turbo-en

πŸ” OAuth Authentication

Octolib supports OAuth authentication for ChatGPT subscriptions and Anthropic:

OpenAI OAuth (ChatGPT Plus/Pro/Team/Enterprise):

export OPENAI_OAUTH_ACCESS_TOKEN="your_oauth_token"
export OPENAI_OAUTH_ACCOUNT_ID="your_account_id"

Anthropic OAuth:

export ANTHROPIC_OAUTH_TOKEN="your_bearer_token"

The library automatically detects OAuth credentials and prefers them over API keys. See examples/openai_oauth.rs and examples/anthropic_oauth.rs for full usage examples.

🎯 Provider Support Matrix

Provider Structured Output Vision Tool Calls Caching
OpenAI βœ… JSON + Schema βœ… Yes βœ… Yes βœ… Yes
OpenRouter βœ… JSON + Schema βœ… Yes βœ… Yes βœ… Yes
DeepSeek βœ… JSON Mode ❌ No ❌ No βœ… Yes
Moonshot AI (Kimi) βœ… JSON Mode βœ… kimi-k2.5 βœ… Yes βœ… Yes
MiniMax βœ… JSON Mode ❌ No βœ… Yes βœ… Yes
Anthropic ❌ No βœ… Yes βœ… Yes βœ… Yes
Z.ai βœ… JSON Mode ❌ No βœ… Yes βœ… Yes
NVIDIA NIM βœ… JSON + Schema Per-model βœ… Yes ❌ No
Groq βœ… JSON + Schema Per-model ❌ No βœ… Select models
BytePlus βœ… JSON + Schema Per-model ❌ No βœ… Yes
Cerebras βœ… JSON + Schema ❌ No ❌ No ❌ No
Featherless βœ… JSON + Schema ❌ No ❌ No ❌ No
Google Vertex ❌ No βœ… Yes βœ… Yes ❌ No
Amazon Bedrock ❌ No βœ… Yes βœ… Yes ❌ No
OctoHub Per-model Per-model βœ… Yes βœ… Yes
Together Per-model Per-model βœ… Yes ❌ No
Cloudflare ❌ No ❌ No ❌ No ❌ No
Local Per-model Per-model Per-model ❌ No
Ollama Per-model Per-model Per-model ❌ No

Structured Output Details

  • JSON Mode: Basic JSON object output
  • JSON Schema: Full schema validation with strict mode
  • Provider Detection: Use provider.supports_structured_output(&model) to check capability

🧠 Thinking/Reasoning Support

Octolib provides first-class support for models that produce thinking/reasoning content. Thinking is stored separately from the main response content, similar to how tool_calls are separate from content.

use octolib::{ProviderFactory, ChatCompletionParams, Message, ThinkingBlock};

async fn thinking_example() -> anyhow::Result<()> {
    // MiniMax and OpenAI o-series models support thinking
    let (provider, model) = ProviderFactory::get_provider_for_model("minimax:MiniMax-M2")?;

    let messages = vec![
        Message::user("Solve this complex math problem step by step"),
    ];

    let params = ChatCompletionParams::new(&messages, &model, 0.7, 1.0, 50, 1000);
    let response = provider.chat_completion(params).await?;

    // Access thinking content (separate from response.content)
    if let Some(ref thinking) = response.thinking {
        println!("=== MODEL THINKING ({}) ===", thinking.tokens);
        println!("{}", thinking.content);
        println!("==========================");
    }

    // Final response (clean, no thinking prefix)
    println!("Response: {}", response.content);
    // Token usage breakdown
    if let Some(usage) = &response.exchange.usage {
        println!("Input tokens: {}", usage.input_tokens);
        println!("Cache read tokens: {}", usage.cache_read_tokens);
        println!("Cache write tokens: {}", usage.cache_write_tokens);
        println!("Output tokens: {}", usage.output_tokens);
        println!("Reasoning tokens: {}", usage.reasoning_tokens);
    }
    Ok(())
}

Supported Providers

Provider Thinking Format Notes
MiniMax Content blocks ({"type": "thinking"}) Full thinking block extraction
OpenAI o-series reasoning_content field o1, o3, o4 models
OpenRouter reasoning_details Gemini and other providers

Token Tracking

Thinking tokens are tracked separately in TokenUsage.reasoning_tokens:

if let Some(usage) = &response.exchange.usage {
    println!("Total tokens: {}", usage.total_tokens);
    println!("  - Input: {}", usage.input_tokens);
    println!("  - Cache Read: {}", usage.cache_read_tokens);
    println!("  - Cache Write: {}", usage.cache_write_tokens);
    println!("  - Output: {}", usage.output_tokens);
    println!("  - Reasoning: {}", usage.reasoning_tokens);
}

πŸ“š Complete Documentation

πŸ“– Quick Navigation

🌐 Supported Providers

Provider Status Capabilities
OpenAI βœ… Full Support Chat, Vision, Tools, Structured Output, Caching
Anthropic βœ… Full Support Claude Models, Vision, Tools, Caching
OpenRouter βœ… Full Support Multi-Provider Proxy, Vision, Caching, Structured Output
Groq βœ… Full Support Fast Inference, Structured Output, Caching
BytePlus βœ… Full Support Seed Models, Structured Output, Caching
DeepSeek βœ… Full Support Open-Source AI Models, Structured Output, Caching
Moonshot AI (Kimi) βœ… Full Support Kimi K2 Series, Vision (kimi-k2.5), Tools, Structured Output, Caching, Thinking
MiniMax βœ… Full Support Anthropic-Compatible API, Tools, Caching, Thinking, Structured Output
Z.ai βœ… Full Support GLM Models, Caching, Structured Output
NVIDIA NIM βœ… Full Support 100+ Hosted Models, Tools, Structured Output, Reference Pricing
Together AI βœ… Full Support Multi-Provider Proxy, Vision, Tools, Structured Output
Cerebras βœ… Full Support Fast Inference, Structured Output
Featherless βœ… Full Support Open-Weight Models (Qwen, Llama, Mistral, DeepSeek, RWKV), Subscription Billing
OctoHub βœ… Supported Local AI Serving
Google Vertex AI βœ… Supported Enterprise AI Integration
Amazon Bedrock βœ… Supported Cloud AI Services
Cloudflare Workers AI βœ… Supported Edge AI Compute
Local LLM βœ… Supported Ollama, LM Studio, LocalAI, Jan, vLLM
Ollama βœ… Supported Local LLM Runner
CLI Proxy βœ… Supported Codex, Claude, Gemini, Cursor

πŸ”’ Privacy & Security

  • 🏠 Local-first design
  • πŸ”‘ Secure API key management
  • πŸ“ Respects .gitignore
  • πŸ›‘οΈ Comprehensive error handling

🀝 Support & Community

βš–οΈ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with ❀️ by the Muvon team in Hong Kong