Skip to content

chore(pricing): Update vertex-ai pricing#704

Closed
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24398271045
Closed

chore(pricing): Update vertex-ai pricing#704
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24398271045

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 7
🔄 Models updated (merged) 16

➕ New Models

  • gemini-2.5-pro-computer-use
  • gemini-2.5-pro-tts
  • gemini-2.5-flash-tts
  • veo-3.1-lite-generate-001
  • translate-llm
  • gemma-4-26b-a4b-it-maas
  • gpt-oss

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash-image
  • gemini-3-flash-preview
  • gemini-3-pro-image-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3.1-flash-image-preview
  • veo-3.1-fast-generate-001
  • text-embedding-005
  • text-multilingual-embedding-002
  • text-embedding-large-exp-03-07
  • textembedding-gecko@003
  • textembedding-gecko-multilingual@001
  • multimodalembedding@001

Model-to-Pricing-Page Mapping

Google – Gemini (text/multimodal)

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 API Standard ≤200K: $1.25/$10; cache read $0.3125; batch $0.625/$5; web_search $3.5; enterprise_web_search $4.5
gemini-2.5-flash Google – Gemini 2.5 API $0.30/$2.50; cache read $0.075; batch $0.15/$1.25; web_search $3.5
gemini-2.5-flash-lite Google – Gemini 2.5 API $0.10/$0.40; cache read $0.025; batch $0.05/$0.20; web_search $3.5
gemini-2.5-flash-image Google – Gemini 2.5 API $0.30/$2.50 + image_token $30/1M; batch $0.15/$1.25 image $15
gemini-2.5-pro-computer-use Google – Gemini 2.5 API Same as Gemini 2.5 Pro pricing
gemini-2.5-pro-tts Google – Gemini 2.5 API TTS variant; uses Gemini 2.5 Pro standard pricing
gemini-2.5-flash-tts Google – Gemini 2.5 API TTS variant; uses Gemini 2.5 Flash standard pricing
gemini-2.0-flash-001 Google – Gemini 2.0 API $0.15/$0.60; batch $0.075/$0.30; web_search $3.5
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API $0.075/$0.30; batch $0.0375/$0.15; web_search $3.5
gemini-3-pro-preview Google – Gemini 3 API $2.00/$12.00 ≤200K; batch $1/$6; web_search $1.4
gemini-3-flash-preview Google – Gemini 3 API $0.50/$3.00; batch $0.25/$1.50; web_search $1.4
gemini-3-pro-image-preview Google – Gemini 3 API $2.00/$12.00 + image_token $120/1M; batch $1/$6
gemini-3.1-pro-preview Google – Gemini 3.1 API $2.00/$12.00 ≤200K; batch $1/$6; web_search $1.4
gemini-3.1-flash-lite-preview Google – Gemini 3.1 API $0.25/$1.50; batch $0.13/$0.75; web_search $1.4
gemini-3.1-flash-image-preview Google – Gemini 3.1 API $0.50/$3.00 + image_token $60/1M; batch $0.25/$1.50

Google – Imagen (image generation)

Model ID Publisher / Section Source Notes
imagen-4.0-generate-001 Google – Imagen 4.0 Generate API $0.04/image
imagen-4.0-ultra-generate-001 Google – Imagen 4.0 Ultra Generate API $0.06/image
imagen-4.0-fast-generate-001 Google – Imagen 4.0 Fast Generate API $0.02/image
imagen-3.0-generate-002 Google – Imagen 3.0 Generate API $0.04/image
imagen-3.0-capability-001 Google – Imagen (capability) API Capability model; uses equivalent generate pricing $0.04/image
imagen-3.0-capability-002 Google – Imagen (capability) API Capability model; uses equivalent generate pricing $0.04/image

Google – Veo (video generation)

Model ID Publisher / Section Source Notes
veo-2.0-generate-001 Google – Veo 2 API $0.50/sec (video); 8s default duration
veo-3.0-generate-001 Google – Veo 3 API $0.20/sec (video-only 720/1080p); 8s default
veo-3.0-fast-generate-001 Google – Veo 3 Fast API $0.10/sec (video-only 720/1080p); 8s default
veo-3.1-generate-001 Google – Veo 3.1 API $0.20/sec (video-only 720/1080p); 8s default
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API $0.10/sec (video-only 720/1080p); 8s default
veo-3.1-lite-generate-001 Google – Veo 3.1 Lite API $0.03/sec (video-only 720p); 8s default

Google – Embedding

Model ID Publisher / Section Source Notes
gemini-embedding-001 Google – Gemini Embedding API $0.00015/1K tokens
gemini-embedding-2-preview Google – Gemini Embedding 2 API $0.0002/1K tokens (text); multimodal variant
text-embedding-005 Google – Text Embedding API $0.000025/1K chars
text-multilingual-embedding-002 Google – Text Multilingual Embedding API $0.000025/1K chars
text-embedding-large-exp-03-07 Google – Text Embedding (experimental) API $0.000025/1K chars; same family as text-embedding-005
textembedding-gecko@003 Google – Text Embedding (legacy) API Legacy model; $0.000025/1K chars
textembedding-gecko-multilingual@001 Google – Text Embedding (legacy) API Legacy model; $0.000025/1K chars
multimodalembedding@001 Google – Multimodal Embedding API $0.0002/1K chars text input

Google – Other

Model ID Publisher / Section Source Notes
translate-llm Google – Translation LLM API $10/1M chars input and output
gemma-4-26b-a4b-it-maas Google – Gemma 4 API MaaS model; $0.15/$0.60/1M tokens

Anthropic – Claude

Model ID Publisher / Section Source Notes
claude-opus-4-6 Anthropic – Claude Opus 4.6 API $5/$25; cache write 5m $6.25; cache read $0.50; batch $2.5/$12.5
claude-sonnet-4-6 Anthropic – Claude Sonnet 4.6 API $3/$15; cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5
claude-opus-4-5@20251101 Anthropic – Claude Opus 4.5 API $5/$25; cache write 5m $6.25; cache read $0.50; batch $2.5/$12.5
claude-sonnet-4-5@20250929 Anthropic – Claude Sonnet 4.5 API $3/$15 (≤200K); cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5
claude-haiku-4-5@20251001 Anthropic – Claude Haiku 4.5 API $1/$5; cache write 5m $1.25; cache read $0.10
claude-opus-4-1@20250805 Anthropic – Claude Opus 4.1 API $15/$75; cache write 5m $18.75; cache read $1.50; batch $7.5/$37.5
claude-opus-4@20250514 Anthropic – Claude Opus 4 API $15/$75; cache write 5m $18.75; cache read $1.50; batch $7.5/$37.5
claude-sonnet-4@20250514 Anthropic – Claude Sonnet 4 API $3/$15; cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5

OpenAI – GPT

Model ID Publisher / Section Source Notes
gpt-oss-120b-maas OpenAI – GPT OSS 120B API $0.09/$0.36; batch $0.045/$0.18
gpt-oss OpenAI – GPT OSS 20B API $0.07/$0.25; cache read $0.007; batch $0.035/$0.125
clip-vit-base-patch32 OpenAI API – excluded Non-generative vision model
openclip OpenAI API – excluded Non-generative vision model
whisper-large OpenAI API – excluded Audio transcription; not generative inference

Meta – Llama

Model ID Publisher / Section Source Notes
llama-3.3-70b-instruct-maas Meta – Llama 3.3 70B API $0.72/$0.72; batch $0.36/$0.36
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 Maverick API $0.35/$1.15; batch $0.175/$0.575
faster-r-cnn Meta API – excluded Non-generative CV (object detection)
retinanet Meta API – excluded Non-generative CV (object detection)
mask-r-cnn Meta API – excluded Non-generative CV (segmentation)
segment-anything Meta API – excluded Non-generative CV (segmentation), self-deploy
sam3 Meta API – excluded Non-generative CV (segmentation)
xlm-roberta-large Meta API – excluded Non-generative NLP, self-deploy
roberta-large Meta API – excluded Non-generative NLP, self-deploy
codellama-7b-hf Meta API – excluded Self-deploy, no -maas
llama2 Meta API – excluded Self-deploy, no -maas
nllb Meta API – excluded Non-generative translation, self-deploy
imagebind Meta API – excluded Embedding/multimodal understanding, self-deploy
llama-2-quantized Meta API – excluded Self-deploy, no -maas
llama3 Meta API – excluded Self-deploy, no -maas
llama-guard Meta API – excluded Guard model
llama4 Meta API – excluded Self-deploy, no -maas
llama3_1 Meta API – excluded Self-deploy, no -maas
prompt-guard Meta API – excluded Guard model
llama3-2 Meta API – excluded Self-deploy, no -maas
llama3-3 Meta API – excluded Self-deploy, no -maas

AI21

Model ID Publisher / Section Source Notes
jamba-large-1.6 AI21 API – excluded Self-deploy (has_deploy: true, no -maas)

Qwen

Model ID Publisher / Section Source Notes
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3 235B API $0.22/$0.88; batch $0.11/$0.44
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3 Coder 480B API $0.22/$1.80; cache read $0.022; batch $0.11/$0.90
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3 Next 80B Instruct API $0.15/$1.20
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3 Next 80B Thinking API $0.15/$1.20
qwq Qwen API – excluded Self-deploy
qwen3 Qwen API – excluded Self-deploy
qwen3-embedding Qwen API – excluded Self-deploy
qwen3-5 Qwen API – excluded Self-deploy
qwen2 Qwen API – excluded Self-deploy
qwen3-coder-next Qwen API – excluded Self-deploy
qwen3-coder Qwen API – excluded Self-deploy
qwen-image Qwen API – excluded Policy exclude (qwen-image)
qwen3-next Qwen API – excluded Self-deploy
qwen3-vl Qwen API – excluded Self-deploy

Mistral

Model ID Publisher / Section Source Notes
mistral-small-2503 Mistral – Mistral Small 3.1 API $0.10/$0.30
mistral-medium-3 Mistral – Mistral Medium 3 API $0.40/$2.00
codestral-2 Mistral – Codestral 2 API $0.30/$0.90
mistral Mistral API – excluded Self-deploy (mistral-ai publisher)
mixtral Mistral API – excluded Self-deploy (mistral-ai publisher)
codestral-2501-self-deploy Mistral API – excluded Self-deploy (name contains self-deploy)
mistral-ocr-2505 Mistral API – excluded OCR model
ministral-3 Mistral API – excluded Self-deploy
mistral-large-3 Mistral API – excluded Self-deploy

DeepSeek

Model ID Publisher / Section Source Notes
deepseek-r1-0528-maas DeepSeek – DeepSeek R1 0528 API $1.35/$5.40; batch $0.675/$2.70
deepseek-v3.1-maas DeepSeek – DeepSeek V3.1 API $0.60/$1.70; cache read $0.06; batch $0.30/$0.85
deepseek-v3.2-maas DeepSeek – DeepSeek V3.2 API $0.56/$1.68; cache read $0.056; batch $0.28/$0.84
deepseek-r1 DeepSeek API – excluded Self-deploy
deepseek-v3 DeepSeek API – excluded Self-deploy
deepseek-ocr-2 DeepSeek API – excluded Self-deploy + OCR
deepseek-v3-1 DeepSeek API – excluded Self-deploy
deepseek-v3-2 DeepSeek API – excluded Self-deploy
deepseek-ocr DeepSeek API – excluded Self-deploy + OCR
deepseek-ocr-maas DeepSeek API – excluded OCR model

Moonshot / Kimi

Model ID Publisher / Section Source Notes
kimi-k2-thinking-maas Moonshot – Kimi K2 Thinking API $0.60/$2.50; cache read $0.06
kimi-k2-5 Moonshot API – excluded Self-deploy
kimi-k2 Moonshot API – excluded Self-deploy

MiniMax

Model ID Publisher / Section Source Notes
minimax-m2-maas MiniMax – MiniMax M2 API $0.30/$1.20; cache read $0.03
minimax-m2 MiniMax API – excluded Self-deploy

ZAI.org / GLM

Model ID Publisher / Section Source Notes
glm-4.7-maas ZAI.org – GLM 4.7 API $0.60/$2.20
glm-5-maas ZAI.org – GLM 5 API $1.00/$3.20; cache read $0.10
glm-4.7 ZAI.org API – excluded Self-deploy
glm-5 ZAI.org API – excluded Self-deploy
glm-ocr ZAI.org API – excluded Self-deploy + OCR
glm-4.5 ZAI.org API – excluded Self-deploy
glm-image ZAI.org API – excluded Policy exclude (glm-image)

Pricing-page-only models (not returned by API)

Model ID Publisher / Section Source Notes
llama-3.1-405b (approx) Meta – Llama 3.1 405B Pricing page only Listed at $5/$16 but not returned by get_vertex_models
llama-4-scout (approx) Meta – Llama 4 Scout Pricing page only Listed at $0.25/$0.70 but not returned by get_vertex_models

Generated by Pricing Agent on 2026-04-14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant