Quota Policy

Cloud Machine Learning Engine limits resource allocation and use and enforces appropriate quotas on a per-project basis. Specific policies vary depending on resource availability, user profile, service usage history, and other factors, and are subject to change without notice.

The sections below outline the current quota limits of the system.

Limits on service requests

You may only make a limited number of individual API requests per 100-second interval. Each limit applies to a particular API or group of APIs as described in the following sections.

You can see your project's request quotas in the API Manager for Cloud ML Engine on Google Cloud Platform Console. You are able to request an increase a given quota up to the maximum value by clicking the edit icon next to the current limit that you want to increase. You can apply for a quota increase beyond the maximum by clicking the edit icon next to the quota limit and then clicking Apply for higher quota.

Job requests

The following limits apply to projects.jobs.create requests (training and batch prediction jobs combined):

Period Default limit Maximum limit
100 seconds 10 100

Online prediction requests

The following limits apply to projects.predict requests:

Period Default limit Maximum limit
100 seconds 1,000 10,000

Resource management requests

The following limits apply to the combined total all of the supported requests in this list:

Period Default limit Maximum limit
100 seconds 50 500

In addition, all of the delete requests listed above and all version create requests are limited to 10 concurrent combined total requests.

Resource quotas

In addition to the limits on requests over time, you have a limit to resource use as shown in the following list:

  • Maximum number of models: 100
  • Maximum number of versions: 200. The version limit is for the total number of versions in your project, which can be distributed among your active models however you want.

Limits on concurrent ML training units and prediction nodes

Google Cloud Platform processing resources allocated to Cloud ML Engine are measured as ML training units for training jobs, and as prediction nodes for online and batch prediction.

You can learn about ML training units and prediction nodes and their effect on costs on the pricing policy page.

The typical project, when first using Cloud ML Engine is limited in the number of concurrent processing resources:

  • Concurrent number of ML training units: 15.
  • Concurrent number of prediction nodes: 24.

Limits on concurrent GPU usage

A typical project is limited to the following number of concurrent GPUs:

  • Concurrent number of GPUs: 10.

The GPUs that you use when training a model are not counted as GPUs for Google Compute Engine, and the quota for Cloud ML Engine does not give you access to any Compute Engine VMs using GPUs. If you want to spin up a Compute Engine VM using a GPU, you must request Compute Engine GPU quota, as described in the Compute Engine documentation.

For more information about GPUs, see how to use GPUs to train models in the cloud.

Requesting a quota increase

The quotas listed on this page are allocated per project, and may increase over time with use. If you need more processing capability, you can apply for a quota increase.

  • Use the Google Cloud Platform Console to request increases for quotas that are listed in the API Manager for Cloud ML Engine:

    1. Find the section of the quota that you want to increase.

    2. Click the pencil icon next to the quota value at the bottom of the usage chart for that quota.

    3. Enter your requested increase:

      • If your desired quota value is within the range displayed on the quota limit dialog, enter your new value and click Save.

      • If you want to increase the quota beyond the maximum displayed, click Apply for higher quota and follow the instructions for the second way to request an increase.

  • Use the custom request form for quotas that aren't listed on the Google Cloud Platform Console, or if you want a quota that is larger than the listed maximum, or if you need to deploy a model that exceeds the default limit of 250 MB:

    1. Go to the Cloud Machine Learning Engine Quota Request form. (You can also follow the Apply for higher quota link in one of the quota increase dialog boxes.)

    2. Fill in the required fields, including a description of your scenario and why it needs increased quotas.

    3. Click Submit. You will get an email response about your request.

What's next

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)