Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 37 additions & 47 deletions api-reference/api-spec.mdx
Original file line number Diff line number Diff line change
@@ -1,34 +1,15 @@
---
title: Introduction
description: Venice API reference covering authentication, debugging, OpenAI compatibility, response headers, error handling, and the full list of supported endpoints.
description: Reference documentation for the Venice API
"og:title": "API Reference | Venice API Docs"
"og:description": "Complete API reference including authentication, debugging, OpenAI compatibility, and response headers"
---

The Venice API offers HTTP-based REST and streaming interfaces for building AI applications with uncensored models and private inference. You can create with text generation, image creation, embeddings, and more, all without restrictive content policies. Integration examples and SDKs are available in the [documentation](/overview/getting-started). Our API reference is also available as a [OpenAPI YAML spec.](https://api.venice.ai/doc/api/swagger.yaml)
The Venice API is a REST API at `https://api.venice.ai/api/v1` for private inference across uncensored, frontier models. It implements the OpenAI specification, so existing OpenAI clients and SDKs work by changing the base URL — then adds Venice-native features like server-side web search, characters, and wallet payments. The full spec is also available as an [OpenAPI YAML file](https://api.venice.ai/doc/api/swagger.yaml).

Check warning on line 8 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L8

Did you really mean 'SDKs'?

## Authentication

The Venice API uses API keys for authentication. Create and manage your API keys in your [API settings](https://venice.ai/settings/api).


All API requests require HTTP Bearer authentication:
## Quickstart

```
Authorization: Bearer VENICE_API_KEY
```

<Note>
Your API key is a secret. Do not share it or expose it in any client-side code.
</Note>

## OpenAI Compatibility

Venice's API implements the OpenAI API specification, ensuring compatibility with existing OpenAI clients and tools. This allows you to integrate with Venice using the familiar OpenAI interface while accessing Venice's unique features and uncensored models.

### Setup

Configure your client to use Venice's base URL (`https://api.venice.ai/api/v1`) and make your first request:
Point any OpenAI-compatible client at Venice's base URL (`https://api.venice.ai/api/v1`). Create and manage keys in your [API settings](https://venice.ai/settings/api).

<CodeGroup>
```bash curl
Expand Down Expand Up @@ -75,7 +56,28 @@
```
</CodeGroup>

## Venice-Specific Features
## Authentication

All API requests require HTTP Bearer authentication:

```
Authorization: Bearer VENICE_API_KEY
```

<Note>
Your API key is a secret. Do not share it or expose it in any client-side code.
</Note>

## Differences from OpenAI

Venice is OpenAI-compatible. The main additions and differences:

- **`venice_parameters`** — Venice-only request options (web search, scraping, character personas, thinking controls). [Reference below](#venice-parameters).
- **System prompts** — Venice appends defaults tuned for natural, uncensored output; disable with `include_venice_system_prompt: false`. [Details below](#system-prompts).
- **Models** — Use Venice model IDs directly rather than OpenAI mappings. [Browse models](/models/overview).
- **Response headers** — Balance, rate-limit, model, and content-safety metadata on every response. [Reference below](#response-headers).
- **Private inference** — TEE-backed and end-to-end-encrypted model options. [Privacy models](/guides/features/tee-e2ee-models).
- **Payments** — Credits, a daily DIEM allowance, or per-request USDC via x402. [x402 guide](/guides/integrations/x402-venice-api).

### System Prompts

Expand All @@ -99,7 +101,7 @@
{"role": "system", "content": "Your custom system prompt"},
{"role": "user", "content": "Why is the sky blue?"}
],
"venice_parameters": {

Check warning on line 104 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L104

Did you really mean 'venice_parameters'?
"include_venice_system_prompt": false
}
}'
Expand Down Expand Up @@ -151,7 +153,7 @@
| `disable_thinking` | boolean | On supported reasoning models, disable thinking and strip the `<think></think>` blocks from the response | `false` |
| `enable_web_search` | string | Enable web search for this request (`off`, `on`, `auto` - auto enables based on model's discretion)<br/>Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping). | `off` |
| `enable_web_scraping` | boolean | Enable web scraping of up to 5 URLs detected in the user message. Scraped content augments responses and bypasses web search. Only successfully scraped URLs are billed.<br/>Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping). | `false` |
| `enable_x_search` | boolean | Enable xAI's native search (web + X/Twitter) for supported Grok models (e.g., `grok-4-20-beta`). Provides higher quality search results by using xAI's search infrastructure. When enabled, Venice's standard web search is bypassed.<br/>Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping). | `false` |

Check warning on line 156 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L156

Did you really mean 'xAI's'?

Check warning on line 156 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L156

Did you really mean 'xAI's'?
| `enable_web_citations` | boolean | When web search is enabled, request that the LLM cite its sources using `[REF]0[/REF]` format | `false` |
| `include_search_results_in_stream` | boolean | Experimental: Include search results in the stream as the first emitted chunk | `false` |
| `return_search_results_as_documents` | boolean | Surface search results in an OpenAI-compatible tool call named `venice_web_search_documents` for LangChain integration | `false` |
Expand All @@ -171,14 +173,12 @@

See [Prompt Caching](/guides/features/prompt-caching) for details on how caching works, billing, and best practices.

## Response Headers Reference
## Response Headers

All Venice API responses include HTTP headers that provide metadata about the request, rate limits, model information, and account balance. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular API request, monitor rate limiting, and track your account balance.
All Venice API responses include HTTP headers with request, rate-limit, model, and account-balance metadata. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular request, monitor rate limiting, and track your account balance.

Venice recommends logging request IDs (`CF-RAY` header) in production deployments for more efficient troubleshooting with our support team, should the need arise.

The table below provides a comprehensive reference of all headers you may encounter:

| Header | Type | Purpose | When Returned |
|--------|------|---------|---------------|
| **Standard HTTP Headers** ||||
Expand All @@ -190,7 +190,7 @@
| `CF-RAY` | string | Unique identifier for this API request, used for troubleshooting and support requests | Always |
| `x-venice-version` | string | Current version/revision of the Venice API service (e.g., `20250828.222653`) | Always |
| `x-venice-timestamp` | string | Server timestamp when the request was processed (ISO 8601 format) | When timestamp tracking is enabled |
| `x-venice-host-name` | string | Hostname of the server that processed the request | Error responses and debugging scenarios |

Check warning on line 193 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L193

Did you really mean 'Hostname'?
| **Model Information** ||||
| `x-venice-model-id` | string | Unique identifier of the AI model used for the request (e.g., `venice-01-lite`) | Inference endpoints using AI models |
| `x-venice-model-name` | string | Friendly/display name of the AI model used (e.g., `Venice Lite`) | Inference endpoints using AI models |
Expand Down Expand Up @@ -219,7 +219,7 @@
| `x-venice-is-adult-model-content-violation` | string | Indicates if content violates adult model content policies (`true`/`false`) | Image generation endpoints |
| `x-venice-contains-minor` | string | Indicates if image contains minors (`true`/`false`) | Image analysis endpoints with age detection |
| **Client Information** ||||
| `x-venice-middleface-version` | string | Version of the Venice middleface client | Requests from Venice middleface clients |

Check warning on line 222 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L222

Did you really mean 'middleface'?

Check warning on line 222 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L222

Did you really mean 'middleface'?
| `x-venice-mobile-version` | string | Version of the Venice mobile app client | Requests from mobile applications |
| `x-venice-request-timestamp-ms` | number | Client-provided request timestamp in milliseconds | When client provides timestamp in request |
| `x-venice-control-instance` | string | Control instance identifier for debugging | Image generation endpoints for debugging |
Expand All @@ -227,16 +227,14 @@
| `x-auth-refreshed` | string | Indicates authentication token was refreshed during request (`true`/`false`) | When authentication tokens are auto-refreshed |
| `x-retry-count` | number | Number of retry attempts for the request | When request retries occur |

### Important Notes
<Accordion title="Notes and an example of accessing headers">

- **Header Name Case**: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
- **String Values**: Boolean values in headers are returned as strings (`"true"` or `"false"`)
- **Numeric Values**: Large numbers and balance values may be returned as strings to prevent precision loss
- **Optional Headers**: Not all headers are returned in every response; presence depends on the endpoint and request context
- **Compression**: Use `Accept-Encoding: gzip, br` in requests to receive compressed responses where supported

### Example: Accessing Response Headers

```javascript
// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
Expand All @@ -250,37 +248,29 @@
console.warn(`Model Deprecation: ${deprecationWarning}`);
}
```
</Accordion>

## Best Practices

1. **Rate Limiting**: Monitor `x-ratelimit-remaining-requests` and `x-ratelimit-remaining-tokens` headers and implement exponential backoff

Check warning on line 255 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L255

Did you really mean 'backoff'?
2. **Balance Monitoring**: Track `x-venice-balance-usd` and `x-venice-balance-diem` headers to avoid service interruptions
3. **System Prompts**: Test with and without Venice's system prompts to find the best fit for your use case
4. **API Keys**: Keep your API keys secure and rotate them regularly
5. **Request Logging**: Log `CF-RAY` header values for troubleshooting with support
6. **Model Deprecation**: Check for `x-venice-model-deprecation-warning` headers when using models

## Differences from OpenAI's API

While Venice maintains high compatibility with the OpenAI API specification, there are some key differences:

1. **venice_parameters**: Additional configurations like `enable_web_search`, `character_slug`, and `strip_thinking_response` for extended functionality
2. **System Prompts**: Venice appends your system prompts to defaults that optimize for uncensored responses (disable with `include_venice_system_prompt: false`)
3. **Model Ecosystem**: Venice offers its own [model lineup](/overview/models) including uncensored and reasoning models - use Venice model IDs rather than OpenAI mappings
4. **Response Headers**: Unique headers for balance tracking (`x-venice-balance-usd`, `x-venice-balance-diem`), model deprecation warnings, and content safety flags
5. **Content Policies**: More permissive policies with dedicated uncensored models and optional content filtering

## API Stability

Venice maintains backward compatibility for v1 endpoints and parameters. For model lifecycle policy, deprecation notices, and migration guidance, see [Deprecations](/overview/deprecations).

## OpenAPI Specification & Raw Data
## Next Steps

For programmatic access to Venice API docs and data — including use with RAG (Retrieval-Augmented Generation) — the following resources are available:
- [Quickstart guide](/overview/getting-started) — from API key to a working integration
- [Endpoints](/api-reference/endpoint/chat/completions) — full reference with an interactive playground
- [Models](/models/overview) — the full model catalog with pricing and capabilities
- [Rate limits](/api-reference/rate-limiting) — per-model request and token limits
- [OpenAPI spec (YAML)](https://api.venice.ai/doc/api/swagger.yaml) — the complete specification for codegen and RAG

Check warning on line 272 in api-reference/api-spec.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/api-spec.mdx#L272

Did you really mean 'codegen'?

* [OpenAPI Spec (YAML)](https://api.venice.ai/doc/api/swagger.yaml) — the full API specification in YAML format
* [API Docs Source](https://github.com/veniceai/api-docs/archive/refs/heads/main.zip) — all documentation pages (`.mdx` format) as a downloadable archive

---
Questions or feedback? Join us on [Discord](https://discord.gg/askvenice).

<sub>Request fields not listed in this documentation may be passed through but are not validated or guaranteed to work.</sub>
82 changes: 43 additions & 39 deletions api-reference/rate-limiting.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
---
title: "Rate Limits"
description: "Venice API rate limits — per-tier request and token quotas, model-specific limits, headers exposing remaining capacity, and how to handle 429 responses."
description: "Request and token rate limits for the Venice API."
"og:title": "Rate Limits | Venice API Docs"
---

Rate limits vary by model and tier. The default limits below are a useful reference, but the `/api_keys/rate_limits` API endpoint is the canonical way to fetch your current limits. You can check your exact limits anytime:
The limits on this page apply to the **standard paid tier** — any funded Venice account using the API lands here. There is no separate lower API tier; [partners](#partner-tier) sit above this tier with higher limits.

How limits are applied:

- **Per model.** Each model resolves to a model-specific override if one exists, otherwise a default based on the model's **size class** and **type**.
- **Text models enforce both limits.** A requests-per-minute (RPM) *and* a tokens-per-minute (TPM) limit apply — whichever you hit first returns a `429`.
- **Video and music models are not rate-limited.** They're priced by usage/credits instead.

The default limits below are a useful reference, but the `/api_keys/rate_limits` endpoint is the canonical way to fetch your current limits:

<CardGroup cols={2}>
<Card title="View Your Limits" icon="gauge-high" href="/api-reference/endpoint/api_keys/rate_limits?playground=open">
Expand All @@ -22,42 +30,53 @@

## Default Limits

### Text Models
### Text & Embedding Models

Text models are grouped into tiers based on size. Each model card on the [Models page](/models/text) displays its tier badge.
Text, reasoning, and embedding models are grouped into size classes. Each model card on the [Models page](/models/text) displays its size badge.

| Tier | Requests/min | Tokens/min |
|:-----|-------------:|-----------:|
| XS | 500 | 1,000,000 |
| S | 75 | 750,000 |
| M | 50 | 750,000 |
| L | 20 | 500,000 |
| Size class | Requests/min | Tokens/min |
|:-----------|-------------:|-----------:|
| X-Small | 500 | 5,000,000 |
| Small | 150 | 3,000,000 |
| Medium | 100 | 2,000,000 |
| Large | 100 | 2,000,000 |

<Accordion title="Which models are in each tier?">
Some upstream providers run on their own classes instead of the size-class defaults:

**XS** `qwen3-4b` `llama-3.2-3b`
| Provider class | Requests/min | Tokens/min |
|:---------------|-------------:|-----------:|
| DeepInfra | 150 | 10,000,000 |
| Anthropic (direct) | 500 | 5,000,000 |
| xAI (direct) | 500 | 10,000,000 |
| OpenRouter | 1,000 | None |
| Parasail | 1,000 | None |

Check warning on line 52 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L52

Did you really mean 'Parasail'?

**S** `mistral-31-24b` `venice-uncensored`
### Image Models

**M** `zai-org-glm-5` `qwen3-next-80b` `google-gemma-3-27b-it`
Covers image generation, upscaling, and inpainting.

Check warning on line 56 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L56

Did you really mean 'upscaling'?

Check warning on line 56 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L56

Did you really mean 'inpainting'?

**L** `qwen3-235b-a22b-instruct-2507` `qwen3-235b-a22b-thinking-2507` `deepseek-ai-DeepSeek-R1` `grok-41-fast` `kimi-k2-thinking` `gemini-3-pro-preview` `hermes-3-llama-3.1-405b` `qwen3-coder-480b-a35b-instruct` `zai-org-glm-4.7` `openai-gpt-oss-120b`
| Class | Requests/min |
|:------|-------------:|
| Default | 20 |
| Bytedance | 50 |

Check warning on line 61 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L61

Did you really mean 'Bytedance'?
| Fal | 50 |

Check warning on line 62 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L62

Did you really mean 'Fal'?
| xAI grok-imagine | 120 |
| xAI grok-imagine (pro) | 20 |

</Accordion>

### Other Models
### Audio Models

| Type | Requests/min |
|:-----|-------------:|
| Image | 20 |
| Audio | 60 |
| Embedding | 500 |
| Video (queue) | 40 |
| Video (retrieve) | 120 |
| Text-to-speech (TTS) | 60 |
| Speech-to-text (ASR) | 60 |

### Video & Music Models

Not rate-limited — these are billed by usage/credits.

## Handling Errors

Failed requests (500, 503, 429) should be retried with exponential backoff.

Check warning on line 79 in api-reference/rate-limiting.mdx

View check run for this annotation

Mintlify / Mintlify Validation (veniceai) - vale-spellcheck

api-reference/rate-limiting.mdx#L79

Did you really mean 'backoff'?

For 429 errors specifically, check the `x-ratelimit-reset-requests` header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.

Expand All @@ -84,21 +103,6 @@

## Partner Tier

Partners get significantly higher rate limits:

| Tier | Requests/min | Tokens/min |
|:-----|-------------:|-----------:|
| XS | 500 | 2,000,000 |
| S | 150 | 1,500,000 |
| M | 100 | 1,500,000 |
| L | 60 | 1,000,000 |

| Type | Requests/min |
|:-----|-------------:|
| Image | 60 |
| Audio | 120 |
| Embedding | 500 |
Partners (`partner-tier-1`) sit above the standard tier with significantly higher limits, tuned to their specific usage.

If you're consistently hitting your rate limits and your usage patterns show **sustained demand over time**, reach out to discuss partner access: [api@venice.ai](mailto:api@venice.ai).

Partner tier limits can be adjusted based on your specific needs.
Loading
Loading