Vertex AI pricing

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

This page covers pricing for Generative AI on Vertex AI. For all other Vertex AI pricing including ML Platform and MLOps services please refer to Vertex AI pricing page.

Google models

Gemini

With the Multimodal models in Vertex AI, you can input either text or media (images, video). Text input is charged by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count, resulting in approximately 4 characters per token. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent. Media input is charged per image or per second (video).

Model Feature Type Price
( =< 128K input tokens)
Price
( > 128K input tokens)
Gemini 1.5 Flash Multimodal Image Input
Video Input
Text Input
Audio Input
$0.00002 / image
$0.00002 / second
$0.00001875 / 1k characters
$0.000002 / second
$0.00004 / image
$0.00004 / second
$0.0000375 / 1k characters
$0.000004 / second
Text Output $0.000075 / 1k characters $0.00015 / 1k characters
Tuning* Training Token $8 / M tokens
Gemini 1.5 Pro Multimodal Image Input
Video Input
Text Input
Audio Input
$0.00032875 / image
$0.00032875 / second
$0.0003125 / 1k characters
$0.00003125 / second
$0.0006575 / image
$0.0006575 / second
$0.000625 / 1k characters
$0.0000625 / second
Text Output $0.00125 / 1k characters $0.0025 / 1k characters
Tuning* Training Token $80 / M tokens
Gemini 1.0 Pro Multimodal Image Input
Video Input
Text Input
$0.0025 / image
$0.002 / second
$0.000125 / 1k characters
Text Output $0.000375 / 1k characters
Grounding with Google Search Text Grounding requests $35 / 1k requests (for up to 1M requests per day).
Please contact your account team if you require more
than 1M requests per day.

* Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
* If a query context is longer than 128K, all tokens are charged at long context rates.
* Gemini models are available in batch mode at 50% discount.
* Gemini 1.0 Pro only support up to 32K context window.
* PDFs are billed as image input, with one PDF page equivalent to one image.
* Tuned model endpoint has the same prediction price as the base model.
* Grounding with Google Search: If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply.

Imagen

With Imagen on Vertex AI, you can generate novel images and edit images based on text prompts you provide, or edit only parts of images using a mask area you define along with a host of other capabilities.

Model Feature Description Input Output Price
Imagen 3 Image generation Generate an image
Edit an image
Customize an image
Text prompt Image $0.04 per image
Imagen 3 Fast Image generation Generate an image Text prompt Image $0.02 per image
Imagen 2, Imagen Image generation Generate an image Text prompt Image $0.020 per image
Image editing Edit an image using mask free or mask approach Image/Text prompt Image $0.020 per image
Upscaling Increase resolution of a generated image to 2k and 4k Image Image $0.003 per image
Fine-tuning Enable a "subject" provided by the user to used in Imagen prompts (few shot training) Subject(s) with text identifier and 4-8 images per subject Fine-tuned model (after training with user provided subjects) $ per node hour (Vertex AI custom training pricing)
Visual Captioning Generate a short or long text caption for an image Image Text caption $0.0015/image
Visual Q&A Provide an answer based on a question referencing an image Image/Text prompt Text answer $0.0015/image

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Embedding

Model Feature Description Input Output Price
multimodalembedding Embeddings for Multimodal: Text Generate embeddings using text as an input Text Embeddings $0.0002 / 1k characters input
Embeddings for Multimodal: Image Generate embeddings using image as an input Image Embeddings $0.0001 / image input
Embeddings for Multimodal: Video Plus Video Plus Video Embeddings (up to 15 embeddings per min of video) $0.0020 per second of video
Embeddings for Multimodal: Video Standard Video Standard Video Embeddings (up to 8 embeddings per min of video) $0.0010 per second of video
Embeddings for Multimodal: Video Essential Video Essential Video Embeddings (up to 4 embeddings per min of video) $0.0005 per second of video
Model Type Region Price per 1,000 characters
Embeddings for Text Input Global
  • Online requests: $0.000025
  • Batch requests: $0.00002
Output Global
  • Online requests: No charge
  • Batch requests: No charge

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Code completion

Generative AI on Vertex AI charges by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count. During the Preview stage, charges are 100% discounted. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent.

Model Type Region Price per 1,000 characters
Codey for Code Completion Input Global
  • Online requests: $0.00025
Output Global
  • Online requests: $0.0005

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Translation (Text)

Use the Vertex AI API and translation LLM to translate text. LLM translations tend to be more fluent and human sounding than classic translation models, but have more limited language support (Learn More).

Model Method Usage Price per million characters
LLM Text translation (Preview)* The number of input characters per month

$10 per million characters*

The number of output characters per month

$10 per million characters*

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
*Price is per character processed by the model. For details about counted characters, see Charged characters

Context Caching

With context caching, you can reduce the cost of Gemini input token processing by 75% and latency of content generation by caching the context portion of your input text or media to Gemini models. The amount of time data is stored in the cache, which can be controlled by the user, determines the "Context Cache Storage" charges. When creating a cached context, users will be charged the standard input token cost. Cache hits on input data are charged at a reduced rate, "Cached Input", instead of the normal input cost. The data size for both storage and input is calculated in the same way as Gemini input pricing.

Model Feature Type Price ( =< 128K input tokens ) Price ( > 128K input tokens )
Gemini 1.5 Flash Cached Input Image Input
Video Input
Text Input
Audio Input
0.000005 / image
0.000005 / second
0.0000046875 / 1k characters
0.0000005 / second
0.00001 / image
0.00001 / second
0.000009375 / 1k characters
0.000001 / second
Context Cache Storage Image Input
Video Input
Text Input
Audio Input
0.000263 / image / hr
0.000263/ second / hr
0.00025 / 1k characters / hr
0.000025 / second / hr
Gemini 1.5 Pro Cached Input Image Input
Video Input
Text Input
Audio Input
0.0000821875 / image
0.0000821875 / second
0.000078125 / 1k characters
0.0000078125 / second
0.000164375 / image
0.000164375 / second
0.00015625 / 1k characters
0.000015625 / second
Context Cache Storage Image Input
Video Input
Text Input
Audio Input
0.0011835 / image / hr
0.0011835/ second / hr
0.001125 / 1k characters / hr
0.0001125 / second / hr

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Example cached cost calculation

If a user creates a 250,000 character cached context with a TTL of 2 hours and subsequently sends twenty separate requests to the Gemini 1.5 Pro model during those 2 hours, and each request has a 200-character query added to the cached context and 400 character output, the total charge is calculated as follows:

Cache Creation cost:
250,000 input characters x ($0.0003125 / 1000) = $0.078125 cached input cost.

Cache Storage cost:
250,000 characters x 2 hours = 500,000 total character hours;
500,000 total character hours x ($0.001125 / 1000) = $0.5625 storage cost.

Requests using cache cost:
200 characters x 20 requests = 4,000 total character inputs
250,000 cached characters * 20 requests = 5,000,000 total cached character inputs
4,000 total character inputs * ($0.0003125 / 1000) = $0.00125 character input cost
5,000,000 total cached character inputs * ($0.000078125 / 1000) = $0.390625 cached input cost
$0.00125 character input cost + $0.390625 cached input cost = $0.391875 total input cost

Output cost:
400 output characters x 20 prompts = 8,000 total output characters;
8,000 total output characters x ($0.00375 / 1000) = $0.03 output cost.

Total cost:
$0.078125 cached input cost + $0.5625 cached storage cost + $0.391875 input cost + $0.03 output cost = $1.0625 total cost.


Example cost calculation

If a user sends five separate requests to the PaLM Text Bison model, and each request has a 200-character input and 400-character output, the total charge is calculated as follows:

Input cost:
200 input characters x 5 prompts = 1,000 total input characters;
1,000 total input characters x ($0.00025 / 1000) = $0.00025 input cost.

Output cost:
400 output characters x 5 prompts = 2,000 total output characters;
2,000 total output characters x ($0.0005 / 1000) = $0.001 output cost.

Total cost:
$0.00025 input cost + $0.001 output cost = $0.00125 total cost.

Partner models

Partner models are a curated list of generative AI models developed by Google partners. Partner models are offered as managed APIs. For more information, see Overview of partner models. The following sections list pricing details for Google partner models.

AI21 Lab's models

Model Pricing
Jamba 1.5 Large Input: $2 / million tokens
Output: $8 / million tokens
Jamba 1.5 Mini Input: $0.20 / million tokens
Output: $0.40 / million tokens

Anthropic’s Claude models

Model Pricing
Claude 3.5 Haiku Input: $0.80 / million tokens
Output: $4.00 / million tokens
Claude 3.5 Sonnet v2 Input: $3 / million tokens
Output: $15 / million tokens
Claude 3.5 Sonnet Input: $3 / million tokens
Output: $15 / million tokens
Claude 3 Haiku Input: $0.25 / million tokens
Output: $1.25 / million tokens
Claude 3 Sonnet Input: $3 / million tokens
Output: $15 / million tokens
Claude 3 Opus Input: $15 / million tokens
Output: $75 / million tokens

Meta's Llama models

Model Pricing
Llama 3.1 405B Input: $5.00 / million tokens
Output: $16.00 / million tokens

Mistral AI’s models

Model Pricing
Mistral Large (24.11) Input: $2.00 / million tokens
Output: $6.00 / million tokens
Mistral Large (24.07) Input: $2.00 / million tokens
Output: $6.00 / million tokens
Mistral Nemo Input: $0.15 / million tokens
Output: $0.15 / million tokens
Codestral (24.05) Input: $0.20 / million tokens
Output: $0.60 / million tokens

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.
Contact sales