Vertex AI pricing

The costs for Vertex AI remain the same as they are for the existing products that Vertex AI supersedes. For example, the cost of training an AutoML image classification model is the same whether you train it with Vertex AI or with AutoML Vision.

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Vertex AI pricing compared to Legacy AI Platform pricing

Pricing for Vertex AI operations and equivalent "legacy" operations are the same for each operation. For example, if you train a model using AI Platform Training, the cost is the same were you to train a model using Vertex AI Training.

If you are using legacy AI platform products, then your billing might be expressed in terms of "ML units".

Vertex AutoML models

For Vertex AutoML models, you pay for three main activities:

  • Training the model
  • Deploying the model to an endpoint
  • Using the model to make predictions

Vertex AI uses predefined machine configurations for Vertex AutoML models, and the hourly rate for these activities reflects the resource usage.

The time required to train your model depends on the size and complexity of your training data. Models must be deployed before they can provide online predictions or online explanations.

You pay for each model deployed to an endpoint, even if no prediction is made. You must undeploy your model to stop incurring further charges. Models that are not deployed or have failed to deploy are not charged.

You pay only for compute hours used; if training fails for any reason other than a user-initiated cancellation, you are not billed for the time. You are charged for training time if you cancel the operation.

Select a model type below for pricing information.

Image data

Operation Price per node hour (classification) Price per node hour (object detection)
Training $3.465 $3.465
Training (Edge on-device model) $18.00 $18.00
Deployment and online prediction $1.375 $2.002
Batch prediction $2.222 $2.222

Video data

Operation Price per node hour (classification, object tracking) Price per node hour (action recognition)
Training $3.234 $3.300
Training (Edge on-device model) $10.78 $11.00
Predictions $0.462 $0.550

Tabular data

Operation Price per node hour for classification/regression Price for forecasting
Training $21.252 $21.252
Prediction Same price as predictions for custom-trained models $1.00 per 1000 forecasts (batch only)

Text data

Operation Price
Legacy data upload (PDF only)

First 1,000 pages free each month

$1.50 per 1,000 pages

$0.60 per 1,000 pages over 5,000,000

Training $3.30 per hour
Deployment $0.05 per hour
Prediction

$5.00 per 1,000 text records

$25.00 per 1,000 document pages, such as PDF files (legacy only)

Prices for Vertex AutoML text prediction requests are computed based on the number of text records you send for analysis. A text record is plain text of up to 1,000 Unicode characters (including whitespace and any markup such as HTML or XML tags).

If the text provided in a prediction request contains more than 1,000 characters, it counts as one text record for each 1,000 characters. For example, if you send three requests that contain 800, 1,500, and 600 characters respectively, you would be charged for four text records: one for the first request (800), two for the second request (1,500), and one for the third request (600).

Prediction charges for Vertex Explainable AI

Compute associated with Vertex Explainable AI is charged at same rate as prediction. However, explanations take longer to process than normal predictions, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase prediction charges.

Custom-trained models

Training

The tables below provide the approximate price per hour of various training configurations. You can choose a custom configuration of selected machine types. To calculate pricing, sum the costs of the virtual machines you use.

If you use Compute Engine machine types and attach accelerators, the cost of the accelerators is separate. To calculate this cost, multiply the prices in the table of accelerators below by how many machine hours of each type of accelerator you use.

Machine types

Americas

Europe

Asia Pacific

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Accelerators

Americas

Europe

Asia Pacific

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

* The price for training using a Cloud TPU Pod is based on the number of cores in the Pod. The number of cores in a pod is always a multiple of 32. To determine the price of training on a Pod that has more than 32 cores, take the price for a 32-core Pod, and multiply it by the number of cores, divided by 32. For example, for a 128-core Pod, the price is (32-core Pod price) * (128/32). For information on which Cloud TPU Pods are available for a specific region, see System Architecture in the Cloud TPU documentation.

Disks

Americas

Europe

Asia Pacific

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

You are charged for training your models from the moment when resources are provisioned for a job until the job finishes.

Scale tiers for predefined configurations (AI Platform Training)

You can control the type of processing cluster to use when training your model. The simplest way is to choose from one of the predefined configurations called scale tiers. Read more about scale tiers.

Machine types for custom configurations

If you use Vertex AI or select CUSTOM as your scale tier for AI Platform Training, you have control over the number and type of virtual machines to use for the cluster's master, worker and parameter servers. Read more about machine types for Vertex AI and machine types for AI Platform Training.

The cost of training with a custom processing cluster is the sum of all the machines you specify. You are charged for the total time of the job, not for the active processing time of individual machines.

Calculate training cost using "Consumed ML units"

The Consumed ML units (Consumed Machine Learning units) shown on your Job details page are equivalent to training units with the duration of the job factored in. When using Consumed ML units in your calculations, use the following formula:

(Consumed ML units) * (Machine type cost)

Example:

  • A data scientist runs a training job on an e2-standard-4 machine instance in the us-west1 (Oregon) region. The field Consumed ML units on their Job details page shows 55.75. The calculation is as follows:

    55.75 consumed ML units * 0.154114

    For a total of $8.59 for the job.

To find your Job details page, go to the Jobs list and click the link for a specific job.

Prediction and explanation

This table provides the prices of batch prediction, online prediction, and online explanation per node hour. A node hour represents the time a virtual machine spends running your prediction job or waiting in a ready state to handle prediction or explanation requests.

Americas

Prediction
Predictions and explanations
Machine types - price per node hour
n1-standard-2 Approximations:
us-east4 $0.123
northamerica-northeast1 $0.1203
Other Americas regions $0.1093
n1-standard-4 Approximations:
us-east4 $0.2461
northamerica-northeast1 $0.2405
Other Americas regions $0.2186
n1-standard-8 Approximations:
us-east4 $0.4922
northamerica-northeast1 $0.4811
Other Americas regions $0.4372
n1-standard-16 Approximations:
us-east4 $0.9843
northamerica-northeast1 $0.9622
Other Americas regions $0.8744
n1-standard-32 Approximations:
us-east4 $1.9687
northamerica-northeast1 $1.9243
Other Americas regions $1.7488
n1-highmem-2 Approximations:
us-east4 $0.1532
northamerica-northeast1 $0.1498
Other Americas regions $0.1361
n1-highmem-4 Approximations:
us-east4 $0.3064
northamerica-northeast1 $0.2995
Other Americas regions $0.2723
n1-highmem-8 Approximations:
us-east4 $0.6129
northamerica-northeast1 $0.5991
Other Americas regions $0.5445
n1-highmem-16 Approximations:
us-east4 $1.2257
northamerica-northeast1 $1.1982
Other Americas regions $1.089
n1-highmem-32 Approximations:
us-east4 $2.4515
northamerica-northeast1 $2.3963
Other Americas regions $2.178
n1-highcpu-2 Approximations:
us-east4 $0.0918
northamerica-northeast1 $0.0897
Other Americas regions $0.0815
n1-highcpu-4 Approximations:
us-east4 $0.1835
northamerica-northeast1 $0.1794
Other Americas regions $0.163
n1-highcpu-8 Approximations:
us-east4 $0.3671
northamerica-northeast1 $0.3588
Other Americas regions $0.326
n1-highcpu-16 Approximations:
us-east4 $0.7341
northamerica-northeast1 $0.7176
Other Americas regions $0.6519
n1-highcpu-32 Approximations:
us-east4 $1.4683
northamerica-northeast1 $1.4352
Other Americas regions $1.3039

Europe

Prediction
Predictions and explanations
Machine types - price per node hour
n1-standard-2 Approximations:
europe-west2 $0.1408
Other Europe regions $0.1265
n1-standard-4 Approximations:
europe-west2 $0.2815
Other Europe regions $0.2531
n1-standard-8 Approximations:
europe-west2 $0.563
Other Europe regions $0.5061
n1-standard-16 Approximations:
europe-west2 $1.126
Other Europe regions $1.0123
n1-standard-32 Approximations:
europe-west2 $2.2521
Other Europe regions $2.0245
n1-highmem-2 Approximations:
europe-west2 $0.1753
Other Europe regions $0.1575
n1-highmem-4 Approximations:
europe-west2 $0.3506
Other Europe regions $0.3151
n1-highmem-8 Approximations:
europe-west2 $0.7011
Other Europe regions $0.6302
n1-highmem-16 Approximations:
europe-west2 $1.4022
Other Europe regions $1.2603
n1-highmem-32 Approximations:
europe-west2 $2.8044
Other Europe regions $2.5206
n1-highcpu-2 Approximations:
europe-west2 $0.105
Other Europe regions $0.0944
n1-highcpu-4 Approximations:
europe-west2 $0.21
Other Europe regions $0.1888
n1-highcpu-8 Approximations:
europe-west2 $0.4199
Other Europe regions $0.3776
n1-highcpu-16 Approximations:
europe-west2 $0.8398
Other Europe regions $0.7552
n1-highcpu-32 Approximations:
europe-west2 $1.6796
Other Europe regions $1.5104

Asia Pacific

Prediction
Predictions and explanations
Machine types - price per node hour
n1-standard-2 Approximations:
asia-northeast1 $0.1402
asia-southeast1 $0.1348
australia-southeast1 $0.155
Other Asia Pacific regions $0.1265
n1-standard-4 Approximations:
asia-northeast1 $0.2803
asia-southeast1 $0.2695
australia-southeast1 $0.31
Other Asia Pacific regions $0.2531
n1-standard-8 Approximations:
asia-northeast1 $0.5606
asia-southeast1 $0.5391
australia-southeast1 $0.6201
Other Asia Pacific regions $0.5061
n1-standard-16 Approximations:
asia-northeast1 $1.1213
asia-southeast1 $1.0782
australia-southeast1 $1.2401
Other Asia Pacific regions $1.0123
n1-standard-32 Approximations:
asia-northeast1 $2.2426
asia-southeast1 $2.1564
australia-southeast1 $2.4802
Other Asia Pacific regions $2.0245
n1-highmem-2 Approximations:
asia-northeast1 $0.1744
asia-southeast1 $0.1678
australia-southeast1 $0.193
Other Asia Pacific regions $0.1575
n1-highmem-4 Approximations:
asia-northeast1 $0.3489
asia-southeast1 $0.3357
australia-southeast1 $0.3861
Other Asia Pacific regions $0.3151
n1-highmem-8 Approximations:
asia-northeast1 $0.6977
asia-southeast1 $0.6713
australia-southeast1 $0.7721
Other Asia Pacific regions $0.6302
n1-highmem-16 Approximations:
asia-northeast1 $1.3955
asia-southeast1 $1.3426
australia-southeast1 $1.5443
Other Asia Pacific regions $1.2603
n1-highmem-32 Approximations:
asia-northeast1 $2.791
asia-southeast1 $2.6852
australia-southeast1 $3.0885
Other Asia Pacific regions $2.5206
n1-highcpu-2 Approximations:
asia-northeast1 $0.1046
asia-southeast1 $0.1005
australia-southeast1 $0.1156
Other Asia Pacific regions $0.0944
n1-highcpu-4 Approximations:
asia-northeast1 $0.2093
asia-southeast1 $0.201
australia-southeast1 $0.2312
Other Asia Pacific regions $0.1888
n1-highcpu-8 Approximations:
asia-northeast1 $0.4186
asia-southeast1 $0.4021
australia-southeast1 $0.4624
Other Asia Pacific regions $0.3776
n1-highcpu-16 Approximations:
asia-northeast1 $0.8371
asia-southeast1 $0.8041
australia-southeast1 $0.9249
Other Asia Pacific regions $0.7552
n1-highcpu-32 Approximations:
asia-northeast1 $1.6742
asia-southeast1 $1.6082
australia-southeast1 $1.8498
Other Asia Pacific regions $1.5104

Each machine type is charged as two separate SKUs on your Google Cloud bill:

  • vCPU cost, measured in vCPU hours
  • RAM cost, measured in GB hours

The prices for machine types in the previous table approximate the total hourly cost for each prediction node of a model version using that machine type. For example, since an n1-highcpu-32 machine type includes 32 vCPUs and 28.8 GB of RAM, the hourly pricing per node is equal to 32 vCPU hours + 28.8 GB hours.

The prices in the previous table are provided to help you estimate prediction costs. The following table shows the vCPU and RAM pricing for prediction machine types, which more precisely reflect the SKUs that you will be charged for:

Americas

Prediction machine type SKUs
vCPU
N. Virginia (us-east4) $0.04094575 per vCPU hour
Montréal (northamerica-northeast1) $0.0400223 per vCPU hour
Other Americas regions $0.03635495 per vCPU hour
RAM
N. Virginia (us-east4) $0.00548665 per GB hour
Montréal (northamerica-northeast1) $0.0053636 per GB hour
Other Americas regions $0.0048783 per GB hour

Europe

Prediction machine type SKUs
vCPU
London (europe-west2) $0.0468395 per vCPU hour
Other Europe regions $0.0421268 per vCPU hour
RAM
London (europe-west2) $0.0062767 per GB hour
Other Europe regions $0.0056373 per GB hour

Asia Pacific

Prediction machine type SKUs
vCPU
Tokyo (asia-northeast1) $0.0467107 per vCPU hour
Singapore (asia-southeast1) $0.04484885 per vCPU hour
Sydney (australia-southeast1) $0.0515844 per vCPU hour
Other Asia Pacific regions $0.0421268 per vCPU hour
RAM
Tokyo (asia-northeast1) $0.00623185 per GB hour
Singapore (asia-southeast1) $0.0060099 per GB hour
Sydney (australia-southeast1) $0.00691265 per GB hour
Other Asia Pacific regions $0.0056373 per GB hour

You can optionally use GPU accelerators for prediction. GPUs incur an additional charge, separate from those described in the previous table. The following table describes the pricing for each type of GPU:

Americas

Accelerators - price per hour
NVIDIA_TESLA_K80
Iowa (us-central1) $0.5175
South Carolina (us-east1) $0.5175
NVIDIA_TESLA_P4
Iowa (us-central1) $0.6900
N. Virginia (us-east4) $0.6900
Montréal (northamerica-northeast1) $0.7475
NVIDIA_TESLA_P100
Oregon (us-west1) $1.6790
Iowa (us-central1) $1.6790
South Carolina (us-east1) $1.6790
NVIDIA_TESLA_T4
Oregon (us-west1) $0.4025
Iowa (us-central1) $0.4025
South Carolina (us-east1) $0.4025
NVIDIA_TESLA_V100
Oregon (us-west1) $2.8520
Iowa (us-central1) $2.8520

Europe

Accelerators - price per hour
NVIDIA_TESLA_K80
Belgium (europe-west1) $0.5635
NVIDIA_TESLA_P4
Netherlands (europe-west4) $0.7475
NVIDIA_TESLA_P100
Belgium (europe-west1) $1.8400
NVIDIA_TESLA_T4
London (europe-west2) $0.4715
Netherlands (europe-west4) $0.4370
NVIDIA_TESLA_V100
Netherlands (europe-west4) $2.9325

Asia Pacific

Accelerators - price per hour
NVIDIA_TESLA_K80
Taiwan (asia-east1) $0.5635
NVIDIA_TESLA_P4
Singapore (asia-southeast1) $0.7475
Sydney (australia-southeast1) $0.7475
NVIDIA_TESLA_P100
Taiwan (asia-east1) $1.8400
NVIDIA_TESLA_T4
Tokyo (asia-northeast1) $0.4255
Singapore (asia-southeast1) $0.4255
Seoul (asia-northeast3) $0.4485
NVIDIA_TESLA_V100 Not available

Pricing is per GPU, so if you use multiple GPUs per prediction node (or if your version scales to use multiple nodes), then costs scale accordingly.

AI Platform Prediction serves predictions from your model by running a number of virtual machines ("nodes"). By default, Vertex AI automatically scales the number of nodes running at any time. For online prediction, the number of nodes scales to meet demand. Each node can respond to multiple prediction requests. For batch prediction, the number of nodes scales to reduce the total time it takes to run a job. You can customize how prediction nodes scale.

You are charged for the time that each node runs for your model, including:

  • When the node is processing a batch prediction job.
  • When the node is processing an online prediction request.
  • When the node is in a ready state for serving online predictions.

The cost of one node running for one hour is a node hour. The table of prediction prices describes the price of a node hour, which varies across regions and between online prediction and batch prediction.

You can consume node hours in fractional increments. For example, one node running for 30 minutes costs 0.5 node hours.

Cost calculations for legacy (MLS1) machine types and batch prediction

  • The running time of a node is measured in one-minute increments, rounded up to the nearest minute. For example, if a node runs for 20.1 minutes, calculate its cost as if it ran for 21 minutes.
  • The running time for nodes that run for less than 10 minutes is rounded up to 10 minutes. For example, if a node runs for only 3 minutes, calculate its cost as if it ran for 10 minutes.

Cost calculations for Compute Engine (N1) machine types

  • The running time of a node is billed in 30-second increments. This means that every 30 seconds, your project is billed for 30 seconds worth of whatever vCPU, RAM, and GPU resources that your node is using at that moment.

More about automatic scaling of prediction nodes

Online prediction Batch prediction
The priority of the scaling is to reduce the latency of individual requests. The service keeps your model in a ready state for a few idle minutes after servicing a request. The priority of the scaling is to reduce the total elapsed time of the job.
Scaling affects your total charges each month: the more numerous and frequent your requests, the more nodes will be used. Scaling should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.

You can choose to let the service scale in response to traffic (automatic scaling) or you can specify a number of nodes to run constantly to avoid latency (manual scaling).

  • If you choose automatic scaling, the number of nodes scales automatically. For AI Platform Prediction legacy (MLS1) machine type deployments, the number of nodes can scale down to zero for no-traffic durations. Vertex AI deployments and other types of AI Platform Prediction deployments cannot scale down to zero nodes.
  • If you choose manual scaling, you specify a number of nodes to keep running all the time. You are charged for all of the time that these nodes are running, starting at the time of deployment and persisting until you delete the model version.
You can affect scaling by setting a maximum number of nodes to use for a batch prediction job, and by setting the number of nodes to keep running for a model when you deploy it.

Minimum 10 minute charge

Recall that if a node runs for less than 10 minutes, you are charged as if it ran for 10 minutes. For example, suppose you use automatic scaling. During a period with no traffic, if you are using a legacy (MLS1) machine type in AI Platform Prediction, zero nodes are in use. (If you use other machine types in AI Platform Prediction or if you use Vertex AI, then at least one node is always in use.) If you receive a single online prediction request, then one node scales up to handle the request. After it handles the request, the node continues running for few minutes in a ready state. Then it stops running. Even if the node ran for less than 10 minutes, you are charged for 10 node minutes (0.17 node hour) for this node's work.

Alternatively, if a single node scales up and handles many online prediction requests within a 10 minute-period before shutting down, you are also charged for 10 node minutes.

You can use manual scaling to control exactly how many nodes run for a certain amount of time. However, if a node runs for less than 10 minutes you are still charged as if it ran for 10 minutes.

Learn more about node allocation and scaling.

Batch prediction jobs are charged after job completion

Batch prediction jobs are charged after job completion, not incrementally during the job. Any Cloud Billing budget alerts that you have configured aren't triggered while a job is running. Before starting a large job, consider running some cost benchmark jobs with small input data first.

Example of a prediction calculation

A real-estate company in an Americas region runs a weekly prediction of housing values in areas it serves. In one month, it runs predictions for four weeks in batches of 3920, 4277, 3849, and 3961. Jobs are limited to one node and each instance takes an average of 0.72 seconds of processing.

First calculate the length of time that each job ran:

3920 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.04 minutes
4277 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 51.324 minutes
3849 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 46.188 minutes
3961 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.532 minutes

Each job ran for more than ten minutes, so it is charged for each minute of processing:

($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964
($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 52 minutes * 1 node = $0.0685711
($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 47 minutes * 1 node = $0.061977725
($0.0909886 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964

The total charge for the month is $0.26.

This example assumed jobs ran on a single node and took a consistent amount of time per input instance. In real usage, make sure to account for multiple nodes and use the actual amount of time each node spends running for your calculations.

Charges for Vertex Explainable AI

Vertex Explainable AI comes at no extra charge to prediction prices. However, explanations take longer to process than normal predictions, so heavy usage of Vertex Explainable AI along with auto-scaling could result in more nodes being started, which would increase prediction charges.

Vertex AI Pipelines

Vertex AI Pipelines charges a run execution fee of $0.03 per Pipeline Run. You are not charged the execution fee during the Preview release. You also pay for Google Cloud resources you use with Vertex AI Pipelines, such as Compute Engine resources consumed by pipeline components (charged at the same rate as for Vertex AI training). Finally, you are responsible for the cost of any services (such as Dataflow) called by your pipeline.

Vertex AI Feature Store

Prices for Vertex AI Feature Store are based on the amount of feature data in online and offline storage as well as the availability of online serving. A node hour represents the time a virtual machine spends serving feature data or waiting in a ready state to handle feature data requests.

Operation Price
Online storage $0.25 per GB-month
Offline Storage $0.023 per GB-month
Online Serving $0.94 per node per hour
Batch Export $0.005 per GB

When you enable feature value monitoring, billing includes applicable charges above in addition to applicable charges that follow:

  • $3.50 per GB for all data analyzed. With snapshot analysis enabled, snapshots taken for data in Vertex AI Feature Store are included. With import feature analysis enabled, batches of ingested data are included.
  • Additional charges for other Vertex AI Feature Store operations used with feature value monitoring include the following:
    • The snapshot analysis feature periodically takes a snapshot of the feature values based on your configuration for the monitoring interval.
    • The charge for a snapshot export is the same as a regular batch export operation.

Snapshot Analysis Example

A data scientist enables feature value monitoring for their Vertex AI Feature Store and turns on monitoring for a daily snapshot analysis. A pipeline runs daily for the entity types monitoring. The pipeline scans 2GB of data in Vertex AI Feature Store and exports a snapshot containing 0.1GB of data. The total charge for one day's analysis is:

(0.1 GB * $3.50) + (2 GB * $0.005) = $0.36

Ingestion Analysis Example

A data scientist enables feature value monitoring for their Vertex AI Feature Store and turns on monitoring for ingestion operations. An ingestion operation imports 1GB of data into Vertex AI Feature Store. The total charge for feature value monitoring is:

(1 GB * $3.50) = $3.50

Vertex ML Metadata

Metadata storage is measured in binary gigabytes (GiB), where 1 GiB is 1,073,741,824 bytes. This unit of measurement is also known as a gibibyte.

Vertex ML Metadata charges $10 per gibibyte (GiB) per month for metadata storage.

Vertex AI TensorBoard

To use Vertex AI TensorBoard, request that the IAM administrator of the project assign you to the role "Vertex AI TensorBoard Web App User". The Vertex AI Administrator role also has access.

Vertex AI TensorBoard charges a monthly fee of $300 per unique active user. Active users are measured through the Vertex AI TensorBoard UI. You also pay for Google Cloud resources you use with Vertex AI TensorBoard, such as TensorBoard logs stored in Cloud Storage.

Vertex AI Vizier

Vertex AI Vizier is a black-box optimization service inside Vertex AI. The Vertex AI Vizier pricing model consists of the following:

  • There is no charge for trials that use RANDOM_SEARCH and GRID_SEARCH. Learn more about the search algorithms.
  • The first 100 Vertex AI Vizier trials per calendar month are available at no charge (trials using RANDOM_SEARCH and GRID_SEARCH do not count against this total).
  • After 100 Vertex AI Vizier trials, subsequent trials during the same calendar month are charged at $1 per trial (trials that use RANDOM_SEARCH or GRID_SEARCH incur no charges).

Vertex AI Matching Engine

Pricing for Vertex AI Matching Engine Approximate Nearest Neighbor service consists of:

  • Per node hour pricing for each VM used to host a deployed index.
  • A cost for building new indexes and updating existing indexes.

Data processed during building and updating indexes is measured in binary gigabytes (GiB), where 1 GiB is 1,073,741,824 bytes. This unit of measurement is also known as a gibibyte.

Vertex AI Matching Engine charges $3.00 per gibibyte (GiB) of data processed in all regions.

The following tables summarize the pricing of index serving in each region where matching engine is available.

Americas

Machine Type - Region - Price per node hour
n1-standard-16
us-central1 $1.0640
us-east1 $1.0640
us-east4 $1.1984
us-west1 $1.0640
n1-standard-32
us-central1 $2.1280
us-east1 $2.1280
us-east4 $2.3968
us-west1 $2.1280

Europe

Machine Type - Region - Price per node hour
n1-standard-16
europe-west1 $1.1715
n1-standard-32
europe-west1 $2.3430

Asia Pacific

Machine Type - Region - Price per node hour
n1-standard-16
asia-southeast1 $1.3126
n1-standard-32
asia-southeast1 $2.6252

Matching engine pricing examples

Vertex AI Matching Engine pricing is determined by the size of your data, the amount of queries per second (QPS) you want to run, and the number of nodes you use. To get your estimated serving cost, you need to calculate your total data size. Your data size is the number of your embeddings/vectors* the number of dimensions you have* 4 bytes per dimension. Once you have the size of your data you can calculate the serving cost and the building cost. The serving cost plus the building cost equals your monthly total cost.

  • Serving cost: # replicas/shard * # shards (~data size/20GB) * $1.064/hr * 24 hrs/day * 30 days/month
  • Building cost: data size(in GB) * $3/GB * # of updates/month

The monthly index building cost is the size of the data * 3.00 per gigabyte. The update frequency does not affect serving cost, just the building cost.

Number of embeddings/vectors Number of dimensions Queries per second (QPS) Update frequency Estimated monthly index building cost Nodes Estimated monthly serving cost
20 million 128 1,000 Monthly $30 1 $766
100 million 256 3,000 Weekly $1,200 15 $11,491
500 million 128 20,000 Weekly $3,000 260 $199,160
1 billion 512 5,000 Monthly $6,000 500 $383,000

All examples are based on n1-standard-16 in us-central1. The cost you incur will vary with recall rate and latency requirements. The estimated monthly serving cost is directly related to the number of nodes used in the console. To learn more about configuration parameters that affect cost, see Configuration parameters which impact recall and latency.

If you have high queries per second (QPS), batching these queries can reduce total costs up to 30%-40%.

Vertex AI Model Monitoring

Vertex AI enables you to monitor the continued effectiveness of your model after you deploy it to production. For more information, see Introduction to Vertex AI Model Monitoring.

When you use Vertex AI Model Monitoring, you are billed for the following:

  • $3.50 per GB for all data analyzed, including the training data provided and prediction data logged in a BigQuery table.
  • Charges for other Google Cloud products that you use with Model Monitoring, such as BigQuery storage or Batch Explain when attribution monitoring is enabled.

Vertex AI Model Monitoring is supported in the following regions: us-central1, europe-west4, asia-east1, and asia-southeast1. Prices are the same for all regions.

Data sizes are measured after they are converted to TfRecord format.

Training datasets incur a one-time charge when you set up a Vertex AI Model Monitoring job.

Prediction Datasets consist of logs collected from the Online Prediction service. As prediction requests arrive during different time windows, the data for each time window is collected and the sum of the data analyzed for each prediction window is used to calculate the charge.

Example: A data scientist runs model monitoring on the prediction traffic belonging to their model.

  • The model is trained from a BigQuery dataset. The data size after converting to TfRecord is 1.5GB.
  • Prediction data logged between 1:00 - 2:00 p.m. is 0.1 GB, between 3:00 - 4:00 p.m. is 0.2 GB.
  • The total price for setting up the model monitoring job is:

    (1.5 GB * $3.50) + ((0.1 GB + 0.2 GB) * $3.50) = $6.30

Vertex AI Workbench, Deep Learning Containers, Deep Learning VM, and AI Platform Pipelines

For Deep Learning Containers, Deep Learning VM Images, and AI Platform Pipelines, pricing is calculated based on the compute resources that you use. These resources will be charged at the same rate you currently pay for Compute Engine and Cloud Storage.

For Vertex AI Workbench, there is a management fee in addition to your infrastructure usage, captured in the tables below.

Select either managed notebooks or user-managed notebooks for pricing information.

Managed notebooks

SKU Management fee per hour
vCPU $0.05 per vCore
T4, K80, and P4 (Standard GPU) $0.35 per instance
P100, V100, and A100 GPU (Premium GPU) $2.48 per instance

User-managed notebooks

SKU Management fee per core hour
vCPU $0.005
T4, K80, and P4 (Standard GPU) $0.035
P100, V100, and A100 GPU (Premium GPU) $0.25

In addition to the compute costs, you also pay for any Google Cloud resources you use. For example:

  • Data analysis services: You incur BigQuery costs when you issue SQL queries within a notebook (see BigQuery pricing).

  • Customer-managed encryption keys: You incur costs when you use customer-managed encryption keys. Each time your managed notebooks or user-managed notebooks instance uses a Cloud Key Management Service key, that operation is billed at the rate of Cloud KMS key operations (see Cloud Key Management Service pricing).

Data labeling

Vertex AI enables you to request human labeling for a collection of data that you plan to use to train a custom machine learning model. Prices for the service are computed based on the type of labeling task.

  • For regular labeling tasks, the prices are determined by the number of annotation units.
    • For an image classification task, units are determined the number of images and the number of human labelers. For example, an image with 3 human labelers counts for 1 * 3 = 3 units. The price for single-label and multi-label classification are the same.
    • For an image bounding box task, units are determined by the number of bounding boxes identified in the images and the number of human labelers. For example, if an image with 2 bounding boxes and 3 human labelers counts for 2 * 3 = 6 units. Images without bounding boxes will not be charged.
    • For an image segmentation/rotated box/polyline/polygon task, units are determined in the same way as a image bounding box task.
    • For a video classification task, units are determined by the video length (every 5 seconds is a price unit) and the number of human labelers. For example, a 25 seconds video with 3 human labelers counts for 25 / 5 * 3 = 15 units. The price for single-label and multi-label classification are the same.
    • For a video object tracking task, unit are determined by the number of objects identified in the video and the number of human labelers. For example, for a video with 2 objects and 3 human labelers, it counts for 2 * 3 = 6 units. Video without objects will not be charged.
    • For a video action recognition task, units are determined in the same way as a video object tracking task.
    • For a text classification task, units are determined by text length (every 50 words is a price unit) and the number of human labelers. For example, one piece of text with 100 words and 3 human labelers counts for 100 / 50 * 3 = 6 units. The price for single-label and multi-label classification is the same.
    • For a text sentiment task, units are determined in the same way as a text classification task.
    • For a text entity extraction task, units are determined by text length (every 50 words is a price unit), the number of entities identified, and the number of human labelers. For example, a piece of text with 100 words, 2 entities identified, and 3 human labelers counts for 100 / 50 * 2 * 3 = 12 units. Text without entities will not be charged.
  • For image/video/text classification and text sentiment tasks, human labelers may lose track of classes if the label set size is too large. As a result, we send at most 20 classes to the human labelers at a time. For example, if the label set size of a labeling task is 40, each data item will be sent for human review 40 / 20 = 2 times, and we will charge 2 times of the price (calculated above) accordingly.

  • For a labeling task that enables the custom labeler feature, each data item is counted as 1 custom labeler unit.

  • For an active learning labeling task for data items with annotations that are generated by models (without a human labeler's help), each data item is counted as 1 active learning unit.

  • For an active learning labeling task for data items with annotations that are generated by human labelers, each data item is counted as a regular labeling task as described above.

The table below provides the price per 1,000 units per human labeler, based on the unit listed for each objective. Tier 1 pricing applies to the first 50,000 units per month in each Google Cloud project; Tier 2 pricing applies to the next 950,000 units per month in the project, up to 1,000,000 units. Contact us for pricing above 1,000,000 units per month.

Data type Objective Unit Tier 1 Tier 2
Image Classification Image $35 $25
Bounding box Bounding box $63 $49
Segmentation Segment $870 $850
Rotated box Bounding box $86 $60
Polygon/polyline Polygon/Polyline $257 $180
Video Classification 5sec video $86 $60
Object tracking Bounding box $86 $60
Action recognition Event in 30sec video $214 $150
Text Classification 50 words $129 $90
Sentiment 50 words $200 $140
Entity extraction Entity $86 $60
Active Learning All Data item $80 $56
Custom Labeler All Data item $80 $56

Required use of Cloud Storage

In addition to the costs described in this document, you are required to store data and program files in Cloud Storage buckets during the Vertex AI lifecycle. This storage is subject to the Cloud Storage pricing policy.

Required use of Cloud Storage includes:

  • Staging your training application package for custom-trained models.

  • Storing your training input data.

  • Storing the output of your training jobs. Vertex AI does not require long-term storage of these items. You can remove the files as soon as the operation is complete.

Free operations for managing your resources

The resource management operations provided by AI Platform are available free of charge. The AI Platform quota policy does limit some of these operations.

Resource Free operations
models create, get, list, delete
versions create, get, list, delete, setDefault
jobs get, list, cancel
operations get, list, cancel, delete

Google Cloud costs

If you store images to be analyzed in Cloud Storage or use other Google Cloud resources in tandem with Vertex AI, then you will also be billed for the use of those services.

To view your current billing status in the Cloud console, including usage and your current bill, see the Billing page. For more details about managing your account, see the Cloud Billing Documentation or Billing and Payments Support.

What's next