Cloud Machine Learning Engine offers scalable, flexible pricing options to help fit your project and budget. Cloud ML Engine charges for training models and getting predictions, but managing your machine learning resources in the cloud is free of charge.
The following table summarizes Cloud ML Engine pricing. All use is also subject to the Cloud ML Engine quota policy.
|Action||US Cost||Europe/Asia Cost|
|Model training||$0.49 per hour, per ML training unit||$0.54 per hour, per ML training unit|
|Prediction||$0.10 per thousand predictions, plus $0.40 per hour||$0.11 per thousand predictions, plus $0.44 per hour|
Required use of Google Cloud Storage
In addition to the costs described in this document, you are required to store data and program files in Google Cloud Storage buckets during the Cloud ML Engine lifecycle. This storage is subject to the Cloud Storage pricing policy.
Required Cloud Storage use includes:
Staging your training application package when training your model.
Storing your training input data.
Staging your model files when you are ready to deploy a version.
Storing your input data for batch prediction.
Storing your batch prediction output files.
Storing the output of your training jobs. In addition, training and batch prediction outputs must also be stored in Cloud Storage buckets. Cloud ML Engine does not require long-term storage of these items; you can remove the files as soon as the operation is complete. Refer to the for details of the costs associated with storing data.
The resource management operations provided by Cloud ML Engine are available free of charge. The Cloud ML Engine quota policy does limit some of these operations.
|models||create, get, list, delete|
|versions||create, get, list, delete, setDefault|
|jobs||get, list, cancel|
|operations||get, list, cancel, delete|
Pricing for training your models in the cloud is defined in terms of ML training units, which are an abstract measurement of the processing power involved. 1 ML training unit represents a standard machine configuration used by the training service. The number of ML training units you will be charged per hour of processing is determined by the processing configuration you choose when you start your training job.
You are charged for training:
- In one-minute increments.
- Based on an hourly rate, shown in the following table.
- A minimum of 10 minutes per training job.
- From the moment when resources are provisioned for a job until the job completes.
|US||$0.49 per hour, per ML training unit.|
|Europe and Asia||$0.54 per hour, per ML training unit.|
ML training units by scale tier
You can control the type of processing cluster to use when training your model. The simplest way is to choose from one of the predefined configurations called scale tiers. Each scale tier has a cost in ML training units:
|Scale Tier||ML training units|
|CUSTOM||Sum of machine types.|
Machine types for custom cluster configurations
If you select
CUSTOM as your scale tier, you have control over the number and
type of virtual machines to use for the cluster's master, workers and parameter
machine in a custom configuration
has its own cost in ML training units per hour:
|Machine type||ML training units per instance|
The cost of training with a custom processing cluster is the sum of all the machines you specify. You are charged for the total time of the job, not for the active processing time of individual machines.
Training cost examples
Finding your training cost requires some calculation:
(ML training units * cost per unit / 60) * job duration in minutes
As illustrated in the following examples:
A data scientist in the US region runs a training job and selects the
STANDARD_1scale tier. Her job takes 15 minutes:
(10 ML training units * $0.49 per hour / 60) * 15 minutes
For a total of $1.23 for the job.
A computer science professor in the US region runs a training job using the
CUSTOMscale tier. She has a very large model, so she wants to take advantage of the large model VMs for her parameter servers. She configures her processing cluster like this:
complex_model_smachine for her master (2 ML training units).
- 5 parameter servers on
large_modelVMs (5 @ 3 = 15 ML training units).
- 8 workers on
complex_model_sVMs (8 @ 2 = 16 ML training units).
Her job runs for 2 hours and 26 minutes:
(33 ML training units * $0.49 per hour / 60) * 146 minutes
For a total of $39.35 for the job.
Prediction pricing applies to requests made to trained model versions hosted by Cloud ML Engine. You are charged a flat rate per prediction processed each month. You are also charged for the time used on each node in the processing cluster that performs the predictions.
You are charged for predictions processed according to the following table:
|Region||Standard rate||High volume rate||Notes|
$0.10 per 1000 predictions
Plus $0.40 per node hour
$0.05 per 1000 predictions
Plus $0.40 per node hour
Predictions are accumulated per month and the cost is rounded to the nearest penny.
Processing is charged per minute separately for each training job, with a minimum charge of ten minutes.
High volume pricing applies to requests after the 100 millionth in a calendar month.
|Europe and Asia||
$0.11 per 1000 predictions
Plus $0.44 per node hour
$0.05 per 1000 predictions
Plus $0.44 per node hour
Understanding node hours
The online processing resources the Cloud ML Engine uses to run your model for prediction are called nodes. You can think of a node as a virtual machine. You are charged for the time that your model is running on a node, including:
- When processing a batch prediction job.
- When processing an online prediction request.
- When your model is in a ready state for serving online predictions.
Cloud ML Engine scales the number of node it uses to accommodate the work both online and batch prediction. This includes the online prediction service keeping your model in a ready state for a few idle minutes after servicing a request. The priority of this scaling is:
- For online prediction, reducing the latency of individual requests.
- For batch prediction, reducing the total elapsed time of the job.
Scaling during batch prediction should have little effect on the price of your job, though there is some overhead involved in bringing up a new node.
Scaling during online prediction will affect your total charges each month: the more numerous and frequent your requests, the more nodes will be used.
You can affect scaling by:
- Setting a maximum number of nodes to use for a batch prediction job.
Setting the number of nodes to keep running for a model when you deploy it.
You can learn more about node allocation and scaling in the prediction guide.
Prediction cost examples
Finding your prediction cost for a month requires some calculation:
(Number of predictions / 1000) * cost per 1K + (hours of processing * cost per hour)
As illustrated in the following example:
A real-estate company in the US runs a weekly prediction of housing values in areas they serve. In one month they run predictions for four weeks in batches of
3961. Predictions take an average of
0.72node seconds of processing to infer.
The cost per prediction is cumulative:
3920 + 4277 + 3849 + 3961 = 16007
They are charged for 16007: (16007 / 1000) * $0.10 = $1.60
The cost for processing is per job (this example uses the average, but real costs use exact values for each job):
3920 * 0.72 = 47.04 minutes 4277 * 0.72 = 51.324 minutes 3849 * 0.72 = 46.188 minutes 3961 * 0.72 = 47.532 minutes
Each job is greater than ten minutes, and so is charged per minute of processing:
(48 * $0.40) / 60 = $0.32 (52 * $0.40) / 60 = $0.35 (47 * $0.40) / 60 = $0.31 (48 * $0.40) / 60 = $0.32
For a total hourly charge of $1.30 and a total charge for the month of $2.90.