Pricing

AI Platform Prediction offers scalable, flexible pricing options to fit your project and budget. AI Platform Prediction charges you for getting predictions, but managing your machine learning resources in the cloud is free of charge.

Pricing overview

The following tables summarize the pricing in each region where AI Platform Prediction is available.

View pricing for AI Platform Training.

Prediction prices

This table provides the prices of batch prediction and online prediction per node hour. A node hour represents the time a virtual machine spends running your prediction job or waiting in a ready state to handle prediction requests. Read more about calculating prediction costs.

Americas

Prediction
Batch prediction $0.0791205 per node hour
Online prediction
Machine types - price per node hour
mls1-c1-m2 (default)

$0.045147

mls1-c4-m2 (Beta)

$0.151962

n1-standard-2 Approximations:
us-east4 $0.107
northamerica-northeast1 $0.1046
Other Americas regions $0.095
n1-standard-4 Approximations:
us-east4 $0.214
northamerica-northeast1 $0.2092
Other Americas regions $0.1901
n1-standard-8 Approximations:
us-east4 $0.428
northamerica-northeast1 $0.4183
Other Americas regions $0.3802
n1-standard-16 Approximations:
us-east4 $0.8559
northamerica-northeast1 $0.8367
Other Americas regions $0.7603
n1-standard-32 Approximations:
us-east4 $1.7119
northamerica-northeast1 $1.6733
Other Americas regions $1.5207
n1-highmem-2 Approximations:
us-east4 $0.1332
northamerica-northeast1 $0.1302
Other Americas regions $0.1184
n1-highmem-4 Approximations:
us-east4 $0.2665
northamerica-northeast1 $0.2605
Other Americas regions $0.2367
n1-highmem-8 Approximations:
us-east4 $0.5329
northamerica-northeast1 $0.5209
Other Americas regions $0.4735
n1-highmem-16 Approximations:
us-east4 $1.0659
northamerica-northeast1 $1.0419
Other Americas regions $0.947
n1-highmem-32 Approximations:
us-east4 $2.1317
northamerica-northeast1 $2.0838
Other Americas regions $1.894
n1-highcpu-2 Approximations:
us-east4 $0.0798
northamerica-northeast1 $0.078
Other Americas regions $0.0709
n1-highcpu-4 Approximations:
us-east4 $0.1596
northamerica-northeast1 $0.156
Other Americas regions $0.1417
n1-highcpu-8 Approximations:
us-east4 $0.3192
northamerica-northeast1 $0.312
Other Americas regions $0.2834
n1-highcpu-16 Approximations:
us-east4 $0.6384
northamerica-northeast1 $0.624
Other Americas regions $0.5669
n1-highcpu-32 Approximations:
us-east4 $1.2768
northamerica-northeast1 $1.248
Other Americas regions $1.1338

Europe

Prediction
Batch prediction $0.086118 per node hour
Online prediction
Machine types - price per node hour
mls1-c1-m2 (default)

$0.044095

mls1-c4-m2 (Beta)

$0.148414

n1-standard-2 Approximations:
europe-west2 $0.1224
europe-west3 $0.1224
Other Europe regions $0.11
n1-standard-4 Approximations:
europe-west2 $0.2448
europe-west3 $0.2448
Other Europe regions $0.2201
n1-standard-8 Approximations:
europe-west2 $0.4896
europe-west3 $0.4896
Other Europe regions $0.4401
n1-standard-16 Approximations:
europe-west2 $0.9792
europe-west3 $0.9792
Other Europe regions $0.8802
n1-standard-32 Approximations:
europe-west2 $1.9583
europe-west3 $1.9583
Other Europe regions $1.7605
n1-highmem-2 Approximations:
europe-west2 $0.1524
europe-west3 $0.1524
Other Europe regions $0.137
n1-highmem-4 Approximations:
europe-west2 $0.3048
europe-west3 $0.3048
Other Europe regions $0.274
n1-highmem-8 Approximations:
europe-west2 $0.6097
europe-west3 $0.6097
Other Europe regions $0.548
n1-highmem-16 Approximations:
europe-west2 $1.2193
europe-west3 $1.2193
Other Europe regions $1.0959
n1-highmem-32 Approximations:
europe-west2 $2.4386
europe-west3 $2.4386
Other Europe regions $2.1918
n1-highcpu-2 Approximations:
europe-west2 $0.0913
europe-west3 $0.0913
Other Europe regions $0.0821
n1-highcpu-4 Approximations:
europe-west2 $0.1826
europe-west3 $0.1826
Other Europe regions $0.1642
n1-highcpu-8 Approximations:
europe-west2 $0.3651
europe-west3 $0.3651
Other Europe regions $0.3284
n1-highcpu-16 Approximations:
europe-west2 $0.7303
europe-west3 $0.7303
Other Europe regions $0.6567
n1-highcpu-32 Approximations:
europe-west2 $1.4606
europe-west3 $1.4606
Other Europe regions $1.3134

Asia Pacific

Prediction
Batch prediction $0.086118 per node hour
Online prediction
Machine types - price per node hour
mls1-c1-m2 (default)

$0.051456

mls1-c4-m2 (Beta)

$0.17331

n1-standard-2 Approximations:
asia-east1 $0.11
asia-northeast1 $0.1219
asia-southeast1 $0.1172
australia-southeast1 $0.1348
n1-standard-4 Approximations:
asia-east1 $0.2201
asia-northeast1 $0.2438
asia-southeast1 $0.2344
australia-southeast1 $0.2696
n1-standard-8 Approximations:
asia-east1 $0.4401
asia-northeast1 $0.4875
asia-southeast1 $0.4688
australia-southeast1 $0.5392
n1-standard-16 Approximations:
asia-east1 $0.8802
asia-northeast1 $0.975
asia-southeast1 $0.9375
australia-southeast1 $1.0784
n1-standard-32 Approximations:
asia-east1 $1.7605
asia-northeast1 $1.9501
asia-southeast1 $1.8751
australia-southeast1 $2.1567
n1-highmem-2 Approximations:
asia-east1 $0.137
asia-northeast1 $0.1517
asia-southeast1 $0.1459
australia-southeast1 $0.1679
n1-highmem-4 Approximations:
asia-east1 $0.274
asia-northeast1 $0.3034
asia-southeast1 $0.2919
australia-southeast1 $0.3357
n1-highmem-8 Approximations:
asia-east1 $0.548
asia-northeast1 $0.6067
asia-southeast1 $0.5837
australia-southeast1 $0.6714
n1-highmem-16 Approximations:
asia-east1 $1.0959
asia-northeast1 $1.2135
asia-southeast1 $1.1675
australia-southeast1 $1.3428
n1-highmem-32 Approximations:
asia-east1 $2.1918
asia-northeast1 $2.4269
asia-southeast1 $2.335
australia-southeast1 $2.6857
n1-highcpu-2 Approximations:
asia-east1 $0.0821
asia-northeast1 $0.091
asia-southeast1 $0.0874
australia-southeast1 $0.1005
n1-highcpu-4 Approximations:
asia-east1 $0.1642
asia-northeast1 $0.182
asia-southeast1 $0.1748
australia-southeast1 $0.2011
n1-highcpu-8 Approximations:
asia-east1 $0.3284
asia-northeast1 $0.364
asia-southeast1 $0.3496
australia-southeast1 $0.4021
n1-highcpu-16 Approximations:
asia-east1 $0.6567
asia-northeast1 $0.7279
asia-southeast1 $0.6992
australia-southeast1 $0.8043
n1-highcpu-32 Approximations:
asia-east1 $1.3134
asia-northeast1 $1.4558
asia-southeast1 $1.3985
australia-southeast1 $1.6085

Compute Engine (N1) machine types for online prediction are only available on regional endpoints, and their pricing is charged as two separate SKUs on your Google Cloud bill:

  • vCPU cost, measured in vCPU hours
  • RAM cost, measured in GB hours

The prices for Compute Engine (N1) machine types in the previous table approximate the total hourly cost for each prediction node of a model version using that machine type. For example, since an n1-highcpu-32 machine type includes 32 vCPUs and 28.8 GB of RAM, the hourly pricing per node is equal to 32 vCPU hours + 28.8 GB hours.

The prices in the previous table are provided to help you estimate online prediction costs. The following table shows the vCPU and RAM pricing for Compute Engine (N1) machine types, which more precisely reflect the SKUs that you will be charged for:

Americas

Compute Engine (N1) machine type SKUs
vCPU
N. Virginia (us-east4) $0.035605 per vCPU hour
Montreal (northamerica-northeast1) $0.034802 per vCPU hour
Other Americas regions $0.031613 per vCPU hour
RAM
N. Virginia (us-east4) $0.004771 per GB hour
Montreal (northamerica-northeast1) $0.004664 per GB hour
Other Americas regions $0.004242 per GB hour

Europe

Compute Engine (N1) machine type SKUs
vCPU
London (europe-west2) $0.04073 per vCPU hour
Frankfurt (europe-west3) $0.04073 per vCPU hour
Other Europe regions $0.036632 per vCPU hour
RAM
London (europe-west2) $0.005458 per GB hour
Frankfurt (europe-west3) $0.005458 per GB hour
Other Europe regions $0.004902 per GB hour

Asia Pacific

Compute Engine (N1) machine type SKUs
vCPU
Taiwan (asia-east1) $0.036632 per vCPU hour
Tokyo (asia-northeast1) $0.040618 per vCPU hour
Singapore (asia-southeast1) $0.038999 per vCPU hour
Sydney (australia-southeast1) $0.044856 per vCPU hour
RAM
Taiwan (asia-east1) $0.004902 per GB hour
Tokyo (asia-northeast1) $0.005419 per GB hour
Singapore (asia-southeast1) $0.005226 per GB hour
Sydney (australia-southeast1) $0.006011 per GB hour

You can optionally use GPU accelerators for online prediction with Compute Engine (N1) machine types. GPUs incur an additional charge, separate from those described in the previous table. The following table describes the pricing for each type of GPU:

Americas

Accelerators - price per hour
NVIDIA_TESLA_K80 $0.4500
NVIDIA_TESLA_P4
Iowa (us-central1) $0.6000
N. Virginia (us-east4) $0.6000
Montréal (northamerica-northeast1) $0.6500
NVIDIA_TESLA_P100 $1.4600
NVIDIA_TESLA_T4 $0.3500
NVIDIA_TESLA_V100 $2.4800

Europe

Accelerators - price per hour
NVIDIA_TESLA_K80 $0.4900
NVIDIA_TESLA_P4 $0.6500
NVIDIA_TESLA_P100 $1.6000
NVIDIA_TESLA_T4
London (europe-west2) $0.4100
Netherlands (europe-west4) $0.3800
NVIDIA_TESLA_V100 $2.5500

Asia Pacific

Accelerators - price per hour
NVIDIA_TESLA_K80 $0.4900
NVIDIA_TESLA_P4
Singapore (asia-southeast1) $0.6500
Sydney (australia-southeast1) $0.6500
NVIDIA_TESLA_P100 $1.6000
NVIDIA_TESLA_T4
Tokyo (asia-northeast1) $0.3700
Singapore (asia-southeast1) $0.3700
NVIDIA_TESLA_V100 Not available

Note that the pricing is per GPU, so if you use multiple GPUs per prediction node (or if your version scales to use multiple nodes), then costs scale accordingly.

Notes:

  1. All use is subject to the AI Platform Prediction quota policy.
  2. You are required to store your data and program files in Google Cloud Storage buckets during the AI Platform Prediction lifecycle. See more about Cloud Storage usage.
  3. For volume-based discounts, contact the Sales team.
  4. If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

The pricing calculator

Use the pricing calculator to estimate your training and prediction costs.

More about prediction costs

AI Platform Prediction serves predictions from your model by running a number of virtual machines ("nodes"). By default, AI Platform Prediction automatically scales the number of nodes running at any time. For online prediction, the number of nodes scales to meet demand. Each node can respond to multiple prediction requests. For batch prediction, the number of nodes scales to reduce the total time it takes to run a job. You can customize how prediction nodes scale.

You are charged for the time that each node runs for your model, including:

  • When the node is processing a batch prediction job.
  • When the node is processing an online prediction request.
  • When the node is in a ready state for serving online predictions.

The cost of one node running for one hour is a node hour. The table of prediction prices describes the price of a node hour, which varies across regions and between online prediction and batch prediction.

You can consume node hours in fractional increments. For example, one node running for 30 minutes costs 0.5 node hours. However, several rules govern cost calculations:

Cost calculations for legacy (MLS1) machine types and batch prediction

  • The running time of a node is measured in one-minute increments, rounded up to the nearest minute. For example, if a node runs for 20.1 minutes, calculate its cost as if it ran for 21 minutes.
  • The running time for nodes that run for less than 10 minutes is rounded up to 10 minutes. For example, if a node runs for only 3 minutes, calculate its cost as if it ran for 10 minutes.

Cost calculations for Compute Engine (N1) machine types

  • The running time of a node is billed in 30-second increments. This means that every 30 seconds, your project is billed for 30 seconds worth of whatever vCPU, RAM, and GPU resources that your node is using at that moment.

More about automatic scaling of prediction nodes

Online prediction Batch prediction
The priority of the scaling is to reduce the latency of individual requests. The service keeps your model in a ready state for a few idle minutes after servicing a request. The priority of the scaling is to reduce the total elapsed time of the job.
Scaling affects your total charges each month: the more numerous and frequent your requests, the more nodes will be used. Scaling should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.

You can choose to let the service scale in response to traffic (automatic scaling) or you can specify a number of nodes to run constantly to avoid latency (manual scaling).

  • If you choose automatic scaling, the number of nodes scales automatically, and can scale down to zero for no-traffic durations.
  • If you choose manual scaling, you specify a number of nodes to keep running all the time. You are charged for all of the time that these nodes are running, starting at the time of deployment and persisting until you delete the model version.
You can affect scaling by setting a maximum number of nodes to use for a batch prediction job, and by setting the number of nodes to keep running for a model when you deploy it.

Minimum 10 minute charge

Recall that if a node runs for less than 10 minutes, you are charged as if it ran for 10 minutes. For example, suppose you use automatic scaling. During a period with no traffic, zero nodes are in use. If you receive a single online prediction request, then one node scales up to handle the request. After it handles the request, the node continues running for few minutes in a ready state. Then it stops running. Even if the node ran for less than 10 minutes, you are charged for 10 node minutes (0.17 node hour) for this node's work.

Alternatively, if a single node scales up and handles many online prediction requests within a 10 minute-period before shutting down, you are also charged for 10 node minutes.

You can use manual scaling to control exactly how many nodes run for a certain amount of time. However, if a node runs for less than 10 minutes you are still charged as if it ran for 10 minutes.

Learn more about node allocation and scaling.

Example of a prediction calculation

A real-estate company in an Americas region runs a weekly prediction of housing values in areas it serves. In one month, it run predictions for four weeks in batches of 3920, 4277, 3849, and 3961. Jobs are limited to one node and each instance takes an average of 0.72 seconds of processing.

First calculate the length of time that each job ran:

3920 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.04 minutes
4277 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 51.324 minutes
3849 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 46.188 minutes
3961 instances * (0.72 seconds / 1 instance) * (1 minute / 60 seconds) = 47.532 minutes

Each job ran for more than ten minutes, so it is charged for each minute of processing:

($0.0791205 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964
($0.0791205 / 1 node hour) * (1 hour / 60 minutes) * 52 minutes * 1 node = $0.0685711
($0.0791205 / 1 node hour) * (1 hour / 60 minutes) * 47 minutes * 1 node = $0.061977725
($0.0791205 / 1 node hour) * (1 hour / 60 minutes) * 48 minutes * 1 node = $0.0632964

The total charge for the month is $0.26.

This example assumed jobs ran on a single node and took a consistent amount of time per input instance. In real usage, make sure to account for multiple nodes and use the actual amount of time each node spends running for your calculations.

Note about AI Platform Prediction charges for AI Explanations

AI Explanations comes at no extra charge to AI Platform Prediction prices. However, explanations take longer to process than normal predictions, so heavy usage of AI Explanations along with auto-scaling could result in more nodes being started, which would increase AI Platform Prediction charges.

Required use of Cloud Storage

In addition to the costs described in this document, you are required to store data and program files in Cloud Storage buckets during the AI Platform Prediction lifecycle. This storage is subject to the Cloud Storage pricing policy.

Required use of Cloud Storage includes:

  • Staging your model files when you are ready to deploy a model version.

  • Storing your input data for batch prediction.

  • Storing the output of your batch prediction jobs. AI Platform Prediction does not require long-term storage of these items. You can remove the files as soon as the operation is complete.

Free operations for managing your resources

The resource management operations provided by AI Platform Prediction are available free of charge. The AI Platform Prediction quota policy does limit some of these operations.

Resource Free operations
models create, get, list, delete
versions create, get, list, delete, setDefault
jobs get, list, cancel
operations get, list, cancel, delete