Vertex AI pricing
The costs for Vertex AI remain the same as they are for the existing products that Vertex AI supersedes. For example, the cost of training an AutoML image classification model is the same whether you train it with Vertex AI or with AutoML Vision.
Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Vertex AI pricing compared to Legacy AI Platform pricing
Pricing for Vertex AI operations and equivalent "legacy" operations are the same for each operation. For example, if you train a model using AI Platform Training, the cost is the same were you to train a model using Vertex AI Training.
If you are using legacy AI platform products, then your billing might be expressed in terms of "ML units".
Vertex AutoML models
For Vertex AutoML models, you pay for three main activities:
- Training the model
- Deploying the model to an endpoint
- Using the model to make predictions
Vertex AI uses predefined machine configurations for Vertex AutoML models, and the hourly rate for these activities reflects the resource usage.
The time required to train your model depends on the size and complexity of your training data. Models must be deployed before they can provide online predictions or online explanations.
You pay for each model deployed to an endpoint, even if no prediction is made. You must undeploy your model to stop incurring further charges. Models that are not deployed or have failed to deploy are not charged.
You pay only for compute hours used; if training fails for any reason other than a user-initiated cancellation, you are not billed for the time. You are charged for training time if you cancel the operation.
Select a model type below for pricing information.
|Operation||Price per node hour (classification)||Price per node hour (object detection)|
|Training (Edge on-device model)||$18.00||$18.00|
|Deployment and online prediction||$1.375||$2.002|
|Operation||Price per node hour (classification, object tracking)||Price per node hour (action recognition)|
|Training (Edge on-device model)||$10.78||$11.00|
|Operation||Price per node hour for classification/regression||Price for forecasting|
|Prediction||Same price as predictions for custom-trained models||$1.00 per 1000 forecasts (batch only)|
|Legacy data upload (PDF only)||
First 1,000 pages free each month
$1.50 per 1,000 pages
$0.60 per 1,000 pages over 5,000,000
|Training||$3.30 per hour|
|Deployment||$0.05 per hour|
$5.00 per 1,000 text records
$25.00 per 1,000 document pages, such as PDF files (legacy only)
Prices for Vertex AutoML text prediction requests are computed based on the number of text records you send for analysis. A text record is plain text of up to 1,000 Unicode characters (including whitespace and any markup such as HTML or XML tags).
If the text provided in a prediction request contains more than 1,000 characters, it counts as one text record for each 1,000 characters. For example, if you send three requests that contain 800, 1,500, and 600 characters respectively, you would be charged for four text records: one for the first request (800), two for the second request (1,500), and one for the third request (600).
Prediction charges for Vertex Explainable AI
Compute associated with Vertex Explainable AI is charged at same rate as prediction. However, explanations take longer to process than normal predictions, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase prediction charges.
The tables below provide the approximate price per hour of various training configurations. You can choose a custom configuration of selected machine types. To calculate pricing, sum the costs of the virtual machines you use.
If you use Compute Engine machine types and attach accelerators, the cost of the accelerators is separate. To calculate this cost, multiply the prices in the table of accelerators below by how many machine hours of each type of accelerator you use.
* The price for training using a Cloud TPU Pod is based on the
number of cores in the Pod. The number of cores in a pod is always a multiple
of 32. To determine the price of training on a Pod that has more than 32 cores,
take the price for a 32-core Pod, and multiply it by the number of cores,
divided by 32. For example, for a 128-core Pod, the price is
(32-core Pod price) * (128/32). For information on which Cloud TPU
Pods are available for a specific region, see System Architecture
in the Cloud TPU documentation.
- All use is subject to the Vertex AI quota policy.
- You are required to store your data and program files in Google Cloud Storage buckets during the Vertex AI lifecycle. See more about Cloud Storage usage.
You are charged for training your models from the moment when resources are provisioned for a job until the job finishes.
Scale tiers for predefined configurations (AI Platform Training)
You can control the type of processing cluster to use when training your model. The simplest way is to choose from one of the predefined configurations called scale tiers. Read more about scale tiers.
Machine types for custom configurations
If you use Vertex AI or select
CUSTOM as your scale tier for
AI Platform Training, you have control over the number and
type of virtual machines to use for the cluster's master, worker and parameter
servers. Read more about
machine types for Vertex AI and
machine types for AI Platform Training.
The cost of training with a custom processing cluster is the sum of all the machines you specify. You are charged for the total time of the job, not for the active processing time of individual machines.
Calculate training cost using "Consumed ML units"
The Consumed ML units (Consumed Machine Learning units) shown on your Job details page are equivalent to training units with the duration of the job factored in. When using Consumed ML units in your calculations, use the following formula:
(Consumed ML units) * (Machine type cost)
A data scientist runs a training job on an
e2-standard-4machine instance in the us-west1 (Oregon) region. The field Consumed ML units on their Job details page shows 55.75. The calculation is as follows:
55.75 consumed ML units * 0.154114
For a total of $8.59 for the job.
To find your Job details page, go to the Jobs list and click the link for a specific job.
Prediction and explanation
This table provides the prices of batch prediction, online prediction, and online explanation per node hour. A node hour represents the time a virtual machine spends running your prediction job or waiting in a ready state to handle prediction or explanation requests.
|Predictions and explanations||
|Predictions and explanations||
|Predictions and explanations||
Each machine type is charged as two separate SKUs on your Google Cloud bill:
- vCPU cost, measured in vCPU hours
- RAM cost, measured in GB hours
The prices for machine types in the previous table approximate the
total hourly cost for each prediction node of a model version
using that machine type. For example, since an
n1-highcpu-32 machine type includes 32 vCPUs and
28.8 GB of RAM, the hourly pricing per node is equal to 32 vCPU
hours + 28.8 GB hours.
The prices in the previous table are provided to help you estimate prediction costs. The following table shows the vCPU and RAM pricing for prediction machine types, which more precisely reflect the SKUs that you will be charged for:
|Prediction machine type SKUs|
|Prediction machine type SKUs|
|Prediction machine type SKUs|
You can optionally use GPU accelerators for prediction. GPUs incur an additional charge, separate from those described in the previous table. The following table describes the pricing for each type of GPU:
|Accelerators - price per hour|
|Accelerators - price per hour|
|Accelerators - price per hour|
Pricing is per GPU, so if you use multiple GPUs per prediction node (or if your version scales to use multiple nodes), then costs scale accordingly.
AI Platform Prediction serves predictions from your model by running a number of virtual machines ("nodes"). By default, Vertex AI automatically scales the number of nodes running at any time. For online prediction, the number of nodes scales to meet demand. Each node can respond to multiple prediction requests. For batch prediction, the number of nodes scales to reduce the total time it takes to run a job. You can customize how prediction nodes scale.
You are charged for the time that each node runs for your model, including:
- When the node is processing a batch prediction job.
- When the node is processing an online prediction request.
- When the node is in a ready state for serving online predictions.
The cost of one node running for one hour is a node hour. The table of prediction prices describes the price of a node hour, which varies across regions and between online prediction and batch prediction.
You can consume node hours in fractional increments. For example, one node running for 30 minutes costs 0.5 node hours.
Cost calculations for legacy (MLS1) machine types and batch prediction
- The running time of a node is measured in one-minute increments, rounded up to the nearest minute. For example, if a node runs for 20.1 minutes, calculate its cost as if it ran for 21 minutes.
- The running time for nodes that run for less than 10 minutes is rounded up to 10 minutes. For example, if a node runs for only 3 minutes, calculate its cost as if it ran for 10 minutes.
Cost calculations for Compute Engine (N1) machine types
- The running time of a node is billed in 30-second increments. This means that every 30 seconds, your project is billed for 30 seconds worth of whatever vCPU, RAM, and GPU resources that your node is using at that moment.
More about automatic scaling of prediction nodes
|Online prediction||Batch prediction|
|The priority of the scaling is to reduce the latency of individual requests. The service keeps your model in a ready state for a few idle minutes after servicing a request.||The priority of the scaling is to reduce the total elapsed time of the job.|
|Scaling affects your total charges each month: the more numerous and frequent your requests, the more nodes will be used.||Scaling should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.|
You can choose to let the service scale in response to traffic (automatic scaling) or you can specify a number of nodes to run constantly to avoid latency (manual scaling).
|You can affect scaling by setting a maximum number of nodes to use for a batch prediction job, and by setting the number of nodes to keep running for a model when you deploy it.|
Minimum 10 minute charge
Recall that if a node runs for less than 10 minutes, you are charged as if it ran for 10 minutes. For example, suppose you use automatic scaling. During a period with no traffic, if you are using a legacy (MLS1) machine type in AI Platform Prediction, zero nodes are in use. (If you use other machine types in AI Platform Prediction or if you use Vertex AI, then at least one node is always in use.) If you receive a single online prediction request, then one node scales up to handle the request. After it handles the request, the node continues running for few minutes in a ready state. Then it stops running. Even if the node ran for less than 10 minutes, you are charged for 10 node minutes (0.17 node hour) for this node's work.
Alternatively, if a single node scales up and handles many online prediction requests within a 10 minute-period before shutting down, you are also charged for 10 node minutes.
You can use manual scaling to control exactly how many nodes run for a certain amount of time. However, if a node runs for less than 10 minutes you are still charged as if it ran for 10 minutes.
Learn more about node allocation and scaling.
Batch prediction jobs are charged after job completion
Batch prediction jobs are charged after job completion, not incrementally during the job. Any Cloud Billing budget alerts that you have configured aren't triggered while a job is running. Before starting a large job, consider running some cost benchmark jobs with small input data first.
Example of a prediction calculation
A real-estate company in an Americas region runs a weekly prediction of
housing values in areas it serves. In one month, it runs predictions for
four weeks in batches of
3961. Jobs are
limited to one node and each instance takes an average of
seconds of processing.
First calculate the length of time that each job ran:
3920 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.04 minutes 4277 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 51.324 minutes 3849 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 46.188 minutes 3961 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.532 minutes
Each job ran for more than ten minutes, so it is charged for each minute of processing:
($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964 ($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 52 minutes * 1 node = $0.0685711 ($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 47 minutes * 1 node = $0.061977725 ($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964
The total charge for the month is $0.26.
This example assumed jobs ran on a single node and took a consistent amount of time per input instance. In real usage, make sure to account for multiple nodes and use the actual amount of time each node spends running for your calculations.
Charges for Vertex Explainable AI
Vertex Explainable AI comes at no extra charge to prediction prices. However, explanations take longer to process than normal predictions, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase prediction charges.
Vertex AI Pipelines
Vertex AI Pipelines charges a run execution fee of $0.03 per Pipeline Run. You are not charged the execution fee during the Preview release. You also pay for Google Cloud resources you use with Vertex AI Pipelines, such as Compute Engine resources consumed by pipeline components (charged at the same rate as for Vertex AI training). Finally, you are responsible for the cost of any services (such as Dataflow) called by your pipeline.
Vertex AI Feature Store
Prices for Vertex AI Feature Store are based on the amount of feature data in online and offline storage as well as the availability of online serving. A node hour represents the time a virtual machine spends serving feature data or waiting in a ready state to handle feature data requests.
|Online storage||$0.25 per GB-month|
|Offline Storage||$0.023 per GB-month|
|Online Serving||$0.94 per node per hour|
|Batch Export||$0.005 per GB|
When you enable feature value monitoring, billing includes applicable charges above in addition to applicable charges that follow:
- $3.50 per GB for all data analyzed. With snapshot analysis enabled, snapshots taken for data in Vertex AI Feature Store are included. With import feature analysis enabled, batches of ingested data are included.
- Additional charges for other Vertex AI Feature Store operations used with feature value monitoring include the following:
- The snapshot analysis feature periodically takes a snapshot of the feature values based on your configuration for the monitoring interval.
- The charge for a snapshot export is the same as a regular batch export operation.
Snapshot Analysis Example
A data scientist enables feature value monitoring for their Vertex AI Feature Store and turns on monitoring for a daily snapshot analysis. A pipeline runs daily for the entity types monitoring. The pipeline scans 2GB of data in Vertex AI Feature Store and exports a snapshot containing 0.1GB of data. The total charge for one day's analysis is:
(0.1 GB * $3.50) + (2 GB * $0.005) = $0.36
Ingestion Analysis Example
A data scientist enables feature value monitoring for their Vertex AI Feature Store and turns on monitoring for ingestion operations. An ingestion operation imports 1GB of data into Vertex AI Feature Store. The total charge for feature value monitoring is:
(1 GB * $3.50) = $3.50
Vertex ML Metadata
Metadata storage is measured in binary gigabytes (GiB), where 1 GiB is
1,073,741,824 bytes. This unit of measurement is also known as a
Vertex ML Metadata charges $10 per gibibyte (GiB) per month for metadata storage.
Vertex AI TensorBoard
To use Vertex AI TensorBoard, request that the IAM administrator of the project assign you to the role "Vertex AI TensorBoard Web App User". The Vertex AI Administrator role also has access.
Vertex AI TensorBoard charges a monthly fee of $300 per unique active user. Active users are measured through the Vertex AI TensorBoard UI. You also pay for Google Cloud resources you use with Vertex AI TensorBoard, such as TensorBoard logs stored in Cloud Storage.
Vertex AI Vizier
Vertex AI Vizier is a black-box optimization service inside Vertex AI. The Vertex AI Vizier pricing model consists of the following:
- There is no charge for trials that use
GRID_SEARCH. Learn more about the search algorithms.
- The first 100 Vertex AI Vizier trials per calendar month are available at no charge (trials using
GRID_SEARCHdo not count against this total).
- After 100 Vertex AI Vizier trials, subsequent trials during the same calendar month are charged at $1 per trial (trials that use
GRID_SEARCHincur no charges).
Vertex AI Matching Engine
Pricing for Vertex AI Matching Engine Approximate Nearest Neighbor service consists of:
- Per node hour pricing for each VM used to host a deployed index.
- A cost for building new indexes and updating existing indexes.
Data processed during building and updating indexes is measured in binary
gigabytes (GiB), where 1 GiB is 1,073,741,824 bytes. This unit of measurement
is also known as a
Vertex AI Matching Engine charges $3.00 per gibibyte (GiB) of data processed in all regions.
The following tables summarize the pricing of index serving in each region where matching engine is available.
|Machine Type - Region - Price per node hour|
|Machine Type - Region - Price per node hour|
|Machine Type - Region - Price per node hour|
Matching engine pricing examples
Vertex AI Matching Engine pricing is determined by the size of your data, the amount of queries per second (QPS) you want to run, and the number of nodes you use. To get your estimated serving cost, you need to calculate your total data size. Your data size is the number of your embeddings/vectors* the number of dimensions you have* 4 bytes per dimension. Once you have the size of your data you can calculate the serving cost and the building cost. The serving cost plus the building cost equals your monthly total cost.
- Serving cost: # replicas/shard * # shards (~data size/20GB) * $1.064/hr * 24 hrs/day * 30 days/month
- Building cost: data size(in GB) * $3/GB * # of updates/month
The monthly index building cost is the size of the data * 3.00 per gigabyte. The update frequency does not affect serving cost, just the building cost.
|Number of embeddings/vectors||Number of dimensions||Queries per second (QPS)||Update frequency||Estimated monthly index building cost||Nodes||Estimated monthly serving cost|
All examples are based on
The cost you incur will vary with recall rate and latency requirements. The estimated monthly
serving cost is directly related to the number of nodes used in the console.
To learn more about configuration parameters that affect cost, see
Configuration parameters which impact recall and latency.
If you have high queries per second (QPS), batching these queries can reduce total costs up to 30%-40%.
Vertex AI Model Monitoring
Vertex AI enables you to monitor the continued effectiveness of your model after you deploy it to production. For more information, see Introduction to Vertex AI Model Monitoring.
When you use Vertex AI Model Monitoring, you are billed for the following:
- $3.50 per GB for all data analyzed, including the training data provided and prediction data logged in a BigQuery table.
- Charges for other Google Cloud products that you use with Model Monitoring, such as BigQuery storage or Batch Explain when attribution monitoring is enabled.
Vertex AI Model Monitoring is supported in the following regions:
asia-southeast1. Prices are the same for all
Data sizes are measured after they are converted to TfRecord format.
Training datasets incur a one-time charge when you set up a Vertex AI Model Monitoring job.
Prediction Datasets consist of logs collected from the Online Prediction service. As prediction requests arrive during different time windows, the data for each time window is collected and the sum of the data analyzed for each prediction window is used to calculate the charge.
Example: A data scientist runs model monitoring on the prediction traffic belonging to their model.
- The model is trained from a BigQuery dataset. The data size after converting to TfRecord is 1.5GB.
- Prediction data logged between 1:00 - 2:00 p.m. is 0.1 GB, between 3:00 - 4:00 p.m. is 0.2 GB.
The total price for setting up the model monitoring job is:
(1.5 GB * $3.50) + ((0.1 GB + 0.2 GB) * $3.50) = $6.30
Vertex AI Workbench, Deep Learning Containers, Deep Learning VM, and AI Platform Pipelines
For Deep Learning Containers, Deep Learning VM Images, and AI Platform Pipelines, pricing is calculated based on the compute resources that you use. These resources will be charged at the same rate you currently pay for Compute Engine and Cloud Storage.
For Vertex AI Workbench, there is a management fee in addition to your infrastructure usage, captured in the tables below.
Select either managed notebooks or user-managed notebooks for pricing information.
|SKU||Management fee per hour|
|vCPU||$0.05 per vCore|
|T4, K80, and P4 (Standard GPU)||$0.35 per instance|
|P100, V100, and A100 GPU (Premium GPU)||$2.48 per instance|
|SKU||Management fee per core hour|
|T4, K80, and P4 (Standard GPU)||$0.035|
|P100, V100, and A100 GPU (Premium GPU)||$0.25|
In addition to the compute costs, you also pay for any Google Cloud resources you use. For example:
Data analysis services: You incur BigQuery costs when you issue SQL queries within a notebook (see BigQuery pricing).
Customer-managed encryption keys: You incur costs when you use customer-managed encryption keys. Each time your managed notebooks or user-managed notebooks instance uses a Cloud Key Management Service key, that operation is billed at the rate of Cloud KMS key operations (see Cloud Key Management Service pricing).
Vertex AI enables you to request human labeling for a collection of data that you plan to use to train a custom machine learning model. Prices for the service are computed based on the type of labeling task.
- For regular labeling tasks, the prices are determined by the number of
- For an image classification task, units are determined the number of images and the number of human labelers. For example, an image with 3 human labelers counts for 1 * 3 = 3 units. The price for single-label and multi-label classification are the same.
- For an image bounding box task, units are determined by the number of bounding boxes identified in the images and the number of human labelers. For example, if an image with 2 bounding boxes and 3 human labelers counts for 2 * 3 = 6 units. Images without bounding boxes will not be charged.
- For an image segmentation/rotated box/polyline/polygon task, units are determined in the same way as a image bounding box task.
- For a video classification task, units are determined by the video length (every 5 seconds is a price unit) and the number of human labelers. For example, a 25 seconds video with 3 human labelers counts for 25 / 5 * 3 = 15 units. The price for single-label and multi-label classification are the same.
- For a video object tracking task, unit are determined by the number of objects identified in the video and the number of human labelers. For example, for a video with 2 objects and 3 human labelers, it counts for 2 * 3 = 6 units. Video without objects will not be charged.
- For a video action recognition task, units are determined in the same way as a video object tracking task.
- For a text classification task, units are determined by text length (every 50 words is a price unit) and the number of human labelers. For example, one piece of text with 100 words and 3 human labelers counts for 100 / 50 * 3 = 6 units. The price for single-label and multi-label classification is the same.
- For a text sentiment task, units are determined in the same way as a text classification task.
- For a text entity extraction task, units are determined by text length (every 50 words is a price unit), the number of entities identified, and the number of human labelers. For example, a piece of text with 100 words, 2 entities identified, and 3 human labelers counts for 100 / 50 * 2 * 3 = 12 units. Text without entities will not be charged.
For image/video/text classification and text sentiment tasks, human labelers may lose track of classes if the label set size is too large. As a result, we send at most 20 classes to the human labelers at a time. For example, if the label set size of a labeling task is 40, each data item will be sent for human review 40 / 20 = 2 times, and we will charge 2 times of the price (calculated above) accordingly.
For a labeling task that enables the custom labeler feature, each data item is counted as 1 custom labeler unit.
For an active learning labeling task for data items with annotations that are generated by models (without a human labeler's help), each data item is counted as 1 active learning unit.
For an active learning labeling task for data items with annotations that are generated by human labelers, each data item is counted as a regular labeling task as described above.
The table below provides the price per 1,000 units per human labeler, based on the unit listed for each objective. Tier 1 pricing applies to the first 50,000 units per month in each Google Cloud project; Tier 2 pricing applies to the next 950,000 units per month in the project, up to 1,000,000 units. Contact us for pricing above 1,000,000 units per month.
|Data type||Objective||Unit||Tier 1||Tier 2|
|Bounding box||Bounding box||$63||$49|
|Rotated box||Bounding box||$86||$60|
|Object tracking||Bounding box||$86||$60|
|Action recognition||Event in 30sec video||$214||$150|
|Active Learning||All||Data item||$80||$56|
|Custom Labeler||All||Data item||$80||$56|
Required use of Cloud Storage
In addition to the costs described in this document, you are required to store data and program files in Cloud Storage buckets during the Vertex AI lifecycle. This storage is subject to the Cloud Storage pricing policy.
Required use of Cloud Storage includes:
Staging your training application package for custom-trained models.
Storing your training input data.
Storing the output of your training jobs. Vertex AI does not require long-term storage of these items. You can remove the files as soon as the operation is complete.
Free operations for managing your resources
The resource management operations provided by AI Platform are available free of charge. The AI Platform quota policy does limit some of these operations.
|models||create, get, list, delete|
|versions||create, get, list, delete, setDefault|
|jobs||get, list, cancel|
|operations||get, list, cancel, delete|
Google Cloud costs
If you store images to be analyzed in Cloud Storage or use other Google Cloud resources in tandem with Vertex AI, then you will also be billed for the use of those services.
To view your current billing status in the Cloud console, including usage and your current bill, see the Billing page. For more details about managing your account, see the Cloud Billing Documentation or Billing and Payments Support.