chore(pricing): Update vertex-ai pricing by siddharthsambharia-portkey · Pull Request #704 · Portkey-AI/models

siddharthsambharia-portkey · 2026-04-14T12:24:22Z

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type	Count
➕ Models added	7
🔄 Models updated (merged)	16

➕ New Models

gemini-2.5-pro-computer-use
gemini-2.5-pro-tts
gemini-2.5-flash-tts
veo-3.1-lite-generate-001
translate-llm
gemma-4-26b-a4b-it-maas
gpt-oss

🔄 Updated Models

gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.5-flash-image
gemini-3-flash-preview
gemini-3-pro-image-preview
gemini-3.1-pro-preview
gemini-3.1-flash-lite-preview
gemini-3.1-flash-image-preview
veo-3.1-fast-generate-001
text-embedding-005
text-multilingual-embedding-002
text-embedding-large-exp-03-07
textembedding-gecko@003
textembedding-gecko-multilingual@001
multimodalembedding@001

Model-to-Pricing-Page Mapping

Google – Gemini (text/multimodal)

Model ID	Publisher / Section	Source	Notes
`gemini-2.5-pro`	Google – Gemini 2.5	API	Standard ≤200K: $1.25/$10; cache read $0.3125; batch $0.625/$5; web_search $3.5; enterprise_web_search $4.5
`gemini-2.5-flash`	Google – Gemini 2.5	API	$0.30/$2.50; cache read $0.075; batch $0.15/$1.25; web_search $3.5
`gemini-2.5-flash-lite`	Google – Gemini 2.5	API	$0.10/$0.40; cache read $0.025; batch $0.05/$0.20; web_search $3.5
`gemini-2.5-flash-image`	Google – Gemini 2.5	API	$0.30/$2.50 + image_token $30/1M; batch $0.15/$1.25 image $15
`gemini-2.5-pro-computer-use`	Google – Gemini 2.5	API	Same as Gemini 2.5 Pro pricing
`gemini-2.5-pro-tts`	Google – Gemini 2.5	API	TTS variant; uses Gemini 2.5 Pro standard pricing
`gemini-2.5-flash-tts`	Google – Gemini 2.5	API	TTS variant; uses Gemini 2.5 Flash standard pricing
`gemini-2.0-flash-001`	Google – Gemini 2.0	API	$0.15/$0.60; batch $0.075/$0.30; web_search $3.5
`gemini-2.0-flash-lite-001`	Google – Gemini 2.0	API	$0.075/$0.30; batch $0.0375/$0.15; web_search $3.5
`gemini-3-pro-preview`	Google – Gemini 3	API	$2.00/$12.00 ≤200K; batch $1/$6; web_search $1.4
`gemini-3-flash-preview`	Google – Gemini 3	API	$0.50/$3.00; batch $0.25/$1.50; web_search $1.4
`gemini-3-pro-image-preview`	Google – Gemini 3	API	$2.00/$12.00 + image_token $120/1M; batch $1/$6
`gemini-3.1-pro-preview`	Google – Gemini 3.1	API	$2.00/$12.00 ≤200K; batch $1/$6; web_search $1.4
`gemini-3.1-flash-lite-preview`	Google – Gemini 3.1	API	$0.25/$1.50; batch $0.13/$0.75; web_search $1.4
`gemini-3.1-flash-image-preview`	Google – Gemini 3.1	API	$0.50/$3.00 + image_token $60/1M; batch $0.25/$1.50

Google – Imagen (image generation)

Model ID	Publisher / Section	Source	Notes
`imagen-4.0-generate-001`	Google – Imagen 4.0 Generate	API	$0.04/image
`imagen-4.0-ultra-generate-001`	Google – Imagen 4.0 Ultra Generate	API	$0.06/image
`imagen-4.0-fast-generate-001`	Google – Imagen 4.0 Fast Generate	API	$0.02/image
`imagen-3.0-generate-002`	Google – Imagen 3.0 Generate	API	$0.04/image
`imagen-3.0-capability-001`	Google – Imagen (capability)	API	Capability model; uses equivalent generate pricing $0.04/image
`imagen-3.0-capability-002`	Google – Imagen (capability)	API	Capability model; uses equivalent generate pricing $0.04/image

Google – Veo (video generation)

Model ID	Publisher / Section	Source	Notes
`veo-2.0-generate-001`	Google – Veo 2	API	$0.50/sec (video); 8s default duration
`veo-3.0-generate-001`	Google – Veo 3	API	$0.20/sec (video-only 720/1080p); 8s default
`veo-3.0-fast-generate-001`	Google – Veo 3 Fast	API	$0.10/sec (video-only 720/1080p); 8s default
`veo-3.1-generate-001`	Google – Veo 3.1	API	$0.20/sec (video-only 720/1080p); 8s default
`veo-3.1-fast-generate-001`	Google – Veo 3.1 Fast	API	$0.10/sec (video-only 720/1080p); 8s default
`veo-3.1-lite-generate-001`	Google – Veo 3.1 Lite	API	$0.03/sec (video-only 720p); 8s default

Google – Embedding

Model ID	Publisher / Section	Source	Notes
`gemini-embedding-001`	Google – Gemini Embedding	API	$0.00015/1K tokens
`gemini-embedding-2-preview`	Google – Gemini Embedding 2	API	$0.0002/1K tokens (text); multimodal variant
`text-embedding-005`	Google – Text Embedding	API	$0.000025/1K chars
`text-multilingual-embedding-002`	Google – Text Multilingual Embedding	API	$0.000025/1K chars
`text-embedding-large-exp-03-07`	Google – Text Embedding (experimental)	API	$0.000025/1K chars; same family as text-embedding-005
`textembedding-gecko@003`	Google – Text Embedding (legacy)	API	Legacy model; $0.000025/1K chars
`textembedding-gecko-multilingual@001`	Google – Text Embedding (legacy)	API	Legacy model; $0.000025/1K chars
`multimodalembedding@001`	Google – Multimodal Embedding	API	$0.0002/1K chars text input

Google – Other

Model ID	Publisher / Section	Source	Notes
`translate-llm`	Google – Translation LLM	API	$10/1M chars input and output
`gemma-4-26b-a4b-it-maas`	Google – Gemma 4	API	MaaS model; $0.15/$0.60/1M tokens

Anthropic – Claude

Model ID	Publisher / Section	Source	Notes
`claude-opus-4-6`	Anthropic – Claude Opus 4.6	API	$5/$25; cache write 5m $6.25; cache read $0.50; batch $2.5/$12.5
`claude-sonnet-4-6`	Anthropic – Claude Sonnet 4.6	API	$3/$15; cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5
`claude-opus-4-5@20251101`	Anthropic – Claude Opus 4.5	API	$5/$25; cache write 5m $6.25; cache read $0.50; batch $2.5/$12.5
`claude-sonnet-4-5@20250929`	Anthropic – Claude Sonnet 4.5	API	$3/$15 (≤200K); cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5
`claude-haiku-4-5@20251001`	Anthropic – Claude Haiku 4.5	API	$1/$5; cache write 5m $1.25; cache read $0.10
`claude-opus-4-1@20250805`	Anthropic – Claude Opus 4.1	API	$15/$75; cache write 5m $18.75; cache read $1.50; batch $7.5/$37.5
`claude-opus-4@20250514`	Anthropic – Claude Opus 4	API	$15/$75; cache write 5m $18.75; cache read $1.50; batch $7.5/$37.5
`claude-sonnet-4@20250514`	Anthropic – Claude Sonnet 4	API	$3/$15; cache write 5m $3.75; cache read $0.30; batch $1.5/$7.5

OpenAI – GPT

Model ID	Publisher / Section	Source	Notes
`gpt-oss-120b-maas`	OpenAI – GPT OSS 120B	API	$0.09/$0.36; batch $0.045/$0.18
`gpt-oss`	OpenAI – GPT OSS 20B	API	$0.07/$0.25; cache read $0.007; batch $0.035/$0.125
`clip-vit-base-patch32`	OpenAI	API – excluded	Non-generative vision model
`openclip`	OpenAI	API – excluded	Non-generative vision model
`whisper-large`	OpenAI	API – excluded	Audio transcription; not generative inference

Meta – Llama

Model ID	Publisher / Section	Source	Notes
`llama-3.3-70b-instruct-maas`	Meta – Llama 3.3 70B	API	$0.72/$0.72; batch $0.36/$0.36
`llama-4-maverick-17b-128e-instruct-maas`	Meta – Llama 4 Maverick	API	$0.35/$1.15; batch $0.175/$0.575
`faster-r-cnn`	Meta	API – excluded	Non-generative CV (object detection)
`retinanet`	Meta	API – excluded	Non-generative CV (object detection)
`mask-r-cnn`	Meta	API – excluded	Non-generative CV (segmentation)
`segment-anything`	Meta	API – excluded	Non-generative CV (segmentation), self-deploy
`sam3`	Meta	API – excluded	Non-generative CV (segmentation)
`xlm-roberta-large`	Meta	API – excluded	Non-generative NLP, self-deploy
`roberta-large`	Meta	API – excluded	Non-generative NLP, self-deploy
`codellama-7b-hf`	Meta	API – excluded	Self-deploy, no -maas
`llama2`	Meta	API – excluded	Self-deploy, no -maas
`nllb`	Meta	API – excluded	Non-generative translation, self-deploy
`imagebind`	Meta	API – excluded	Embedding/multimodal understanding, self-deploy
`llama-2-quantized`	Meta	API – excluded	Self-deploy, no -maas
`llama3`	Meta	API – excluded	Self-deploy, no -maas
`llama-guard`	Meta	API – excluded	Guard model
`llama4`	Meta	API – excluded	Self-deploy, no -maas
`llama3_1`	Meta	API – excluded	Self-deploy, no -maas
`prompt-guard`	Meta	API – excluded	Guard model
`llama3-2`	Meta	API – excluded	Self-deploy, no -maas
`llama3-3`	Meta	API – excluded	Self-deploy, no -maas

AI21

Model ID	Publisher / Section	Source	Notes
`jamba-large-1.6`	AI21	API – excluded	Self-deploy (has_deploy: true, no -maas)

Qwen

Model ID	Publisher / Section	Source	Notes
`qwen3-235b-a22b-instruct-2507-maas`	Qwen – Qwen3 235B	API	$0.22/$0.88; batch $0.11/$0.44
`qwen3-coder-480b-a35b-instruct-maas`	Qwen – Qwen3 Coder 480B	API	$0.22/$1.80; cache read $0.022; batch $0.11/$0.90
`qwen3-next-80b-a3b-instruct-maas`	Qwen – Qwen3 Next 80B Instruct	API	$0.15/$1.20
`qwen3-next-80b-a3b-thinking-maas`	Qwen – Qwen3 Next 80B Thinking	API	$0.15/$1.20
`qwq`	Qwen	API – excluded	Self-deploy
`qwen3`	Qwen	API – excluded	Self-deploy
`qwen3-embedding`	Qwen	API – excluded	Self-deploy
`qwen3-5`	Qwen	API – excluded	Self-deploy
`qwen2`	Qwen	API – excluded	Self-deploy
`qwen3-coder-next`	Qwen	API – excluded	Self-deploy
`qwen3-coder`	Qwen	API – excluded	Self-deploy
`qwen-image`	Qwen	API – excluded	Policy exclude (qwen-image)
`qwen3-next`	Qwen	API – excluded	Self-deploy
`qwen3-vl`	Qwen	API – excluded	Self-deploy

Mistral

Model ID	Publisher / Section	Source	Notes
`mistral-small-2503`	Mistral – Mistral Small 3.1	API	$0.10/$0.30
`mistral-medium-3`	Mistral – Mistral Medium 3	API	$0.40/$2.00
`codestral-2`	Mistral – Codestral 2	API	$0.30/$0.90
`mistral`	Mistral	API – excluded	Self-deploy (mistral-ai publisher)
`mixtral`	Mistral	API – excluded	Self-deploy (mistral-ai publisher)
`codestral-2501-self-deploy`	Mistral	API – excluded	Self-deploy (name contains self-deploy)
`mistral-ocr-2505`	Mistral	API – excluded	OCR model
`ministral-3`	Mistral	API – excluded	Self-deploy
`mistral-large-3`	Mistral	API – excluded	Self-deploy

DeepSeek

Model ID	Publisher / Section	Source	Notes
`deepseek-r1-0528-maas`	DeepSeek – DeepSeek R1 0528	API	$1.35/$5.40; batch $0.675/$2.70
`deepseek-v3.1-maas`	DeepSeek – DeepSeek V3.1	API	$0.60/$1.70; cache read $0.06; batch $0.30/$0.85
`deepseek-v3.2-maas`	DeepSeek – DeepSeek V3.2	API	$0.56/$1.68; cache read $0.056; batch $0.28/$0.84
`deepseek-r1`	DeepSeek	API – excluded	Self-deploy
`deepseek-v3`	DeepSeek	API – excluded	Self-deploy
`deepseek-ocr-2`	DeepSeek	API – excluded	Self-deploy + OCR
`deepseek-v3-1`	DeepSeek	API – excluded	Self-deploy
`deepseek-v3-2`	DeepSeek	API – excluded	Self-deploy
`deepseek-ocr`	DeepSeek	API – excluded	Self-deploy + OCR
`deepseek-ocr-maas`	DeepSeek	API – excluded	OCR model

Moonshot / Kimi

Model ID	Publisher / Section	Source	Notes
`kimi-k2-thinking-maas`	Moonshot – Kimi K2 Thinking	API	$0.60/$2.50; cache read $0.06
`kimi-k2-5`	Moonshot	API – excluded	Self-deploy
`kimi-k2`	Moonshot	API – excluded	Self-deploy

MiniMax

Model ID	Publisher / Section	Source	Notes
`minimax-m2-maas`	MiniMax – MiniMax M2	API	$0.30/$1.20; cache read $0.03
`minimax-m2`	MiniMax	API – excluded	Self-deploy

ZAI.org / GLM

Model ID	Publisher / Section	Source	Notes
`glm-4.7-maas`	ZAI.org – GLM 4.7	API	$0.60/$2.20
`glm-5-maas`	ZAI.org – GLM 5	API	$1.00/$3.20; cache read $0.10
`glm-4.7`	ZAI.org	API – excluded	Self-deploy
`glm-5`	ZAI.org	API – excluded	Self-deploy
`glm-ocr`	ZAI.org	API – excluded	Self-deploy + OCR
`glm-4.5`	ZAI.org	API – excluded	Self-deploy
`glm-image`	ZAI.org	API – excluded	Policy exclude (glm-image)

Pricing-page-only models (not returned by API)

Model ID	Publisher / Section	Source	Notes
`llama-3.1-405b` (approx)	Meta – Llama 3.1 405B	Pricing page only	Listed at $5/$16 but not returned by get_vertex_models
`llama-4-scout` (approx)	Meta – Llama 4 Scout	Pricing page only	Listed at $0.25/$0.70 but not returned by get_vertex_models

Generated by Pricing Agent on 2026-04-14

siddharthsambharia-portkey added 2 commits April 14, 2026 17:54

chore(pricing): Update vertex-ai pricing

1847103

chore(general): Add 5 new vertex-ai model configs

ed48927

siddharthsambharia-portkey closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#704

chore(pricing): Update vertex-ai pricing#704
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24398271045

siddharthsambharia-portkey commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siddharthsambharia-portkey commented Apr 14, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

➕ New Models

🔄 Updated Models

Model-to-Pricing-Page Mapping

Google – Gemini (text/multimodal)

Google – Imagen (image generation)

Google – Veo (video generation)

Google – Embedding

Google – Other

Anthropic – Claude

OpenAI – GPT

Meta – Llama

AI21

Qwen

Mistral

DeepSeek

Moonshot / Kimi

MiniMax

ZAI.org / GLM

Pricing-page-only models (not returned by API)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant