Vertex AI pricing
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
This page covers pricing for Generative AI on Vertex AI. For all other Vertex AI pricing including ML Platform and MLOps services please refer to Vertex AI pricing page.
Google models
Gemini
With the Multimodal models in Vertex AI, you can input either text or media (images, video). Text input is charged by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count, resulting in approximately 4 characters per token. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent. Media input is charged per image or per second (video).
Model | Feature | Type | Price ( =< 128K input tokens) |
Price ( > 128K input tokens) |
---|---|---|---|---|
Gemini 1.5 Flash | Multimodal | Image Input Video Input Text Input Audio Input |
$0.00002 / image $0.00002 / second $0.00001875 / 1k characters $0.000002 / second |
$0.00004 / image $0.00004 / second $0.0000375 / 1k characters $0.000004 / second |
Text Output | $0.000075 / 1k characters | $0.00015 / 1k characters | ||
Tuning* | Training Token | $8 / M tokens | ||
Gemini 1.5 Pro | Multimodal | Image Input Video Input Text Input Audio Input |
$0.00032875 / image $0.00032875 / second $0.0003125 / 1k characters $0.00003125 / second |
$0.0006575 / image $0.0006575 / second $0.000625 / 1k characters $0.0000625 / second |
Text Output | $0.00125 / 1k characters | $0.0025 / 1k characters | ||
Tuning* | Training Token | $80 / M tokens | ||
Gemini 1.0 Pro | Multimodal | Image Input Video Input Text Input |
$0.0025 / image $0.002 / second $0.000125 / 1k characters |
|
Text Output | $0.000375 / 1k characters | |||
Grounding with Google Search | Text | Grounding requests | $35 / 1k requests (for up to 1M requests per day). Please contact your account team if you require more than 1M requests per day. |
* Prices are listed in US Dollars (USD).
If you pay in a currency other than USD, the prices listed in your currency on
Cloud Platform SKUs
apply.
* If a query context is longer than 128K, all tokens are charged at long context rates.
* Gemini models are available in batch mode at 50% discount.
* Gemini 1.0 Pro only support up to 32K context window.
* PDFs are billed as image input, with one PDF page equivalent to one image.
* Tuned model endpoint has the same prediction price as the base model.
* Grounding with Google Search: If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply.
Imagen
With Imagen on Vertex AI, you can generate novel images and edit images based on text prompts you provide, or edit only parts of images using a mask area you define along with a host of other capabilities.
Model | Feature | Description | Input | Output | Price |
---|---|---|---|---|---|
Imagen 3 | Image generation | Generate an image | Text prompt | Image | $0.04 per image |
Imagen 3 Fast | Image generation | Generate an image | Text prompt | Image | $0.02 per image |
Imagen 2, Imagen | Image generation | Generate an image | Text prompt | Image | $0.020 per image |
Image editing | Edit an image using mask free or mask approach | Image/Text prompt | Image | $0.020 per image | |
Upscaling | Increase resolution of a generated image to 2k and 4k | Image | Image | $0.003 per image | |
Fine-tuning | Enable a "subject" provided by the user to used in Imagen prompts (few shot training) | Subject(s) with text identifier and 4-8 images per subject | Fine-tuned model (after training with user provided subjects) | $ per node hour (Vertex AI custom training pricing) | |
Visual Captioning | Generate a short or long text caption for an image | Image | Text caption | $0.0015/image | |
Visual Q&A | Provide an answer based on a question referencing an image | Image/Text prompt | Text answer | $0.0015/image |
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Embedding
Model | Feature | Description | Input | Output | Price |
---|---|---|---|---|---|
multimodalembedding | Embeddings for Multimodal: Text | Generate embeddings using text as an input | Text | Embeddings | $0.0002 / 1k characters input |
Embeddings for Multimodal: Image | Generate embeddings using image as an input | Image | Embeddings | $0.0001 / image input | |
Embeddings for Multimodal: Video Plus | Video Plus | Video | Embeddings (up to 15 embeddings per min of video) | $0.0020 per second of video | |
Embeddings for Multimodal: Video Standard | Video Standard | Video | Embeddings (up to 8 embeddings per min of video) | $0.0010 per second of video | |
Embeddings for Multimodal: Video Essential | Video Essential | Video | Embeddings (up to 4 embeddings per min of video) | $0.0005 per second of video |
Model | Type | Region | Price per 1,000 characters |
---|---|---|---|
Embeddings for Text | Input | Global |
|
Output | Global |
|
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Code completion
Generative AI on Vertex AI charges by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count. During the Preview stage, charges are 100% discounted. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent.
Model | Type | Region | Price per 1,000 characters |
---|---|---|---|
Codey for Code Completion | Input | Global |
|
Output | Global |
|
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Translation (Text)
Use the Vertex AI API and translation LLM to translate text. LLM translations tend to be more fluent and human sounding than classic translation models, but have more limited language support (Learn More).
Model | Method | Usage | Price per million characters |
---|---|---|---|
LLM | Text translation (Preview)* | The number of input characters per month |
$10 per million characters* |
The number of output characters per month |
$10 per million characters* |
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply. *Price is per character processed by the model. For details about counted characters, see Charged characters
Context Caching
With context caching, you can reduce the cost of Gemini input token processing by 75% and latency of content generation by caching the context portion of your input text or media to Gemini models. The amount of time data is stored in the cache, which can be controlled by the user, determines the "Context Cache Storage" charges. When creating a cached context, users will be charged the standard input token cost. Cache hits on input data are charged at a reduced rate, "Cached Input", instead of the normal input cost. The data size for both storage and input is calculated in the same way as Gemini input pricing.
Model | Feature | Type | Price ( =< 128K input tokens ) | Price ( > 128K input tokens ) |
---|---|---|---|---|
Gemini 1.5 Flash | Cached Input | Image Input Video Input Text Input Audio Input |
0.000005 / image 0.000005 / second 0.0000046875 / 1k characters 0.0000005 / second |
0.00001 / image 0.00001 / second 0.000009375 / 1k characters 0.000001 / second |
Context Cache Storage | Image Input Video Input Text Input Audio Input |
0.000263 / image / hr 0.000263/ second / hr 0.00025 / 1k characters / hr 0.000025 / second / hr |
||
Gemini 1.5 Pro | Cached Input | Image Input Video Input Text Input Audio Input |
0.0000821875 / image 0.0000821875 / second 0.000078125 / 1k characters 0.0000078125 / second |
0.000164375 / image 0.000164375 / second 0.00015625 / 1k characters 0.000015625 / second |
Context Cache Storage | Image Input Video Input Text Input Audio Input |
0.0011835 / image / hr 0.0011835/ second / hr 0.001125 / 1k characters / hr 0.0001125 / second / hr |
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Example cached cost calculation
If a user creates a 250,000 character cached context with a TTL of 2 hours and subsequently sends twenty separate requests to the Gemini 1.5 Pro model during those 2 hours, and each request has a 200-character query added to the cached context and 400 character output, the total charge is calculated as follows:
Cache Creation cost:
250,000 input characters x ($0.0003125 / 1000) = $0.078125 cached input cost.
Cache Storage cost:
250,000 characters x 2 hours = 500,000 total character hours;
500,000 total character hours x ($0.001125 / 1000) = $0.5625 storage cost.
Requests using cache cost:
200 characters x 20 requests = 4,000 total character inputs
250,000 cached characters * 20 requests = 5,000,000 total cached character inputs
4,000 total character inputs * ($0.0003125 / 1000) = $0.00125 character input cost
5,000,000 total cached character inputs * ($0.000078125 / 1000) = $0.390625 cached input cost
$0.00125 character input cost + $0.390625 cached input cost = $0.391875 total input cost
Output cost:
400 output characters x 20 prompts = 8,000 total output characters;
8,000 total output characters x ($0.00375 / 1000) = $0.03 output cost.
Total cost:
$0.078125 cached input cost + $0.5625 cached storage cost + $0.391875 input cost + $0.03 output cost = $1.0625 total cost.
Example cost calculation
If a user sends five separate requests to the PaLM Text Bison model, and each request has a 200-character input and 400-character output, the total charge is calculated as follows:
Input cost:
200 input characters x 5 prompts = 1,000 total input characters;
1,000 total input characters x ($0.00025 / 1000) = $0.00025 input cost.
Output cost:
400 output characters x 5 prompts = 2,000 total output characters;
2,000 total output characters x ($0.0005 / 1000) = $0.001 output cost.
Total cost:
$0.00025 input cost + $0.001 output cost = $0.00125 total cost.
Partner models
Partner models are a curated list of generative AI models developed by Google partners. Partner models are offered as managed APIs. For more information, see Overview of partner models. The following sections list pricing details for Google partner models.
AI21 Lab's models
Model | Pricing |
---|---|
Jamba 1.5 Large | Input: $2 / million tokens Output: $8 / million tokens |
Jamba 1.5 Mini | Input: $0.20 / million tokens Output: $0.40 / million tokens |
Anthropic’s Claude models
Model | Pricing |
---|---|
Claude 3.5 Haiku | Input: $1.00 / million tokens Output: $5.00 / million tokens |
Claude 3.5 Sonnet v2 | Input: $3 / million tokens Output: $15 / million tokens |
Claude 3.5 Sonnet | Input: $3 / million tokens Output: $15 / million tokens |
Claude 3 Haiku | Input: $0.25 / million tokens Output: $1.25 / million tokens |
Claude 3 Sonnet | Input: $3 / million tokens Output: $15 / million tokens |
Claude 3 Opus | Input: $15 / million tokens Output: $75 / million tokens |
Meta's Llama models
Model | Pricing |
---|---|
Llama 3.1 405B | Input: $5.00 / million tokens Output: $16.00 / million tokens |
Mistral AI’s models
Model | Pricing |
---|---|
Mistral Large (24.11) | Input: $2.00 / million tokens Output: $6.00 / million tokens |
Mistral Large (24.07) | Input: $2.00 / million tokens Output: $6.00 / million tokens |
Mistral Nemo | Input: $0.15 / million tokens Output: $0.15 / million tokens |
Codestral (24.05) | Input: $0.20 / million tokens Output: $0.60 / million tokens |