Quota Policy

Cloud Machine Learning Engine limits resource allocation and use, and enforces appropriate quotas on a per-project basis. Specific policies vary depending on resource availability, user profile, service usage history, and other factors, and are subject to change without notice.

The sections below outline the current quota limits of the system.

Limits on service requests

You may only make a limited number of individual API requests per 60-second interval. Each limit applies to a particular API or group of APIs as described in the following sections.

You can see your project's request quotas in the API Manager for Cloud ML Engine on Google Cloud Platform Console. You can apply for a quota increase by clicking the edit icon next to the quota limit and then clicking Apply for higher quota.

Job requests

The following limits apply to projects.jobs.create requests (training and batch prediction jobs combined):

Period Limit
60 seconds 60

Online prediction requests

The following limits apply to projects.predict requests:

Period Limit
60 seconds 6000

Resource management requests

The following limits apply to the combined total all of the supported requests in this list:

Period Limit
60 seconds 300

In addition, all of the delete requests listed above and all version create requests are limited to 10 concurrent combined total requests.

Resource quotas

In addition to the limits on requests over time, you have a limit to resource use as shown in the following list:

  • Maximum number of models: 100.
  • Maximum number of versions: 200. The version limit is for the total number of versions in your project, which can be distributed among your active models however you want.

Requesting a quota increase

The quotas listed on this page are allocated per project, and may increase over time with use. If you need more processing capability, you can apply for a quota increase.

  • Use the Google Cloud Platform Console to request increases for quotas that are listed in the API Manager for Cloud ML Engine:

    1. Find the section of the quota that you want to increase.

    2. Click the pencil icon next to the quota value at the bottom of the usage chart for that quota.

    3. Enter your requested increase:

      • If your desired quota value is within the range displayed on the quota limit dialog, enter your new value and click Save.

      • If you want to increase the quota beyond the maximum displayed, click Apply for higher quota and follow the instructions for the second way to request an increase.

  • Use the custom request form for quotas that aren't listed on the Google Cloud Platform Console, or if you want a quota that is larger than the listed maximum, or if you need to deploy a model that exceeds the default limit of 250 MB:

    1. Go to the Cloud ML Engine Quota Request form. (You can also follow the Apply for higher quota link in one of the quota increase dialog boxes.)

    2. Fill in the required fields, including a description of your scenario and why it needs increased quotas.

    3. Click Submit. You will get an email response about your request.

Limits on concurrent usage of virtual machines

Your project's usage of GCP processing resources is measured by the number of virtual machines used for training, and the number of nodes used for batch prediction. This section describes the limits to concurrent usage of these resources across your project.

Limits on concurrent nodes for batch prediction

A typical project, when first using Cloud ML Engine, is limited in the number of concurrent nodes used for batch prediction:

  • Concurrent number of prediction nodes: 72.

Node usage for online prediction

Cloud ML Engine does not apply quotas to node usage for online prediction. See more about prediction nodes and resource allocation.

Limits on concurrent CPU usage for training

The number of concurrent virtual CPUs for a typical project is scaled, based on the usage history of your project.

  • Total concurrent number of CPUs: Starting from 20 CPUs, scaling to a typical value of 450 CPUs. These limits represent the combined maximum number of CPUs in concurrent use, including all machine types.

The CPUs that you use when training a model are not counted as CPUs for Compute Engine, and the quota for Cloud ML Engine does not give you access to any Compute Engine VMs for other computing requirements. If you want to spin up a Compute Engine VM, you must request Compute Engine quota, as described in the Compute Engine documentation.

Limits on concurrent GPU usage for training

A typical project, when first using Cloud ML Engine, is limited to the following number of concurrent GPUs used in training ML models:

  • Total concurrent number of GPUs: This is the maximum number of GPUs in concurrent use, split per type as follows:

    • Concurrent number of Tesla K80 GPUs: 30.
    • Concurrent number of Tesla P100 GPUs: 30.

The GPUs that you use when training a model are not counted as GPUs for Compute Engine, and the quota for Cloud ML Engine does not give you access to any Compute Engine VMs using GPUs. If you want to spin up a Compute Engine VM using a GPU, you must request Compute Engine GPU quota, as described in the Compute Engine documentation.

For more information about GPUs, see how to use GPUs to train models in the cloud.

Limits on concurrent TPU usage for training (Beta)

All Google Cloud Platform projects are allocated quota for at least one Cloud TPU by default.

If you need additional Cloud TPU quota, complete the TPU quota request form. Quota is allocated in units of 8 TPU cores per Cloud TPU.

You will receive a notification when the quota is approved. The next step is to configure your Google Cloud Platform project to use the TPU. See the guide to using TPUs.

Please note that, due to high demand, we may not be able to grant every quota request.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud ML Engine for TensorFlow