Quota policy

AI Platform Training limits resource allocation and use, and enforces appropriate quotas on a per-project basis. Specific policies vary depending on resource availability, user profile, service usage history, and other factors, and are subject to change without notice.

The sections below outline the current quota limits of the system.

Limits on service requests

You may only make a limited number of individual API requests per 60-second interval. Each limit applies to a particular API or group of APIs as described in the following sections.

You can see your project's request quotas in the API Manager for AI Platform Training on Google Cloud console. You can apply for a quota increase by clicking the edit icon next to the quota limit and then clicking Apply for higher quota.

Job requests

The following limits apply to projects.jobs.create requests (training and batch prediction jobs combined):

Period Limit
60 seconds 60

Online prediction requests

The following limits apply to projects.predict requests:

Period Limit
60 seconds 600,000

Resource management requests

The following limits apply to the combined total all of the supported requests in this list:

Period Limit
60 seconds 300

In addition, all of the delete requests listed above and all version create requests are limited to 10 concurrent combined total requests.

Limits on concurrent usage of virtual machines

Your project's usage of Google Cloud processing resources is measured by the number of virtual machines that it uses. This section describes the limits to concurrent usage of these resources across your project.

Limits on concurrent CPU usage for training

The number of concurrent virtual CPUs for a typical project is scaled, based on the usage history of your project.

  • Total concurrent number of CPUs: Starting from 20 CPUs, scaling to a typical value of 450 CPUs. These limits represent the combined maximum number of CPUs in concurrent use, including all machine types.

Certain regions have additional default quotas. When you use CPUs in these regions, they count toward the regional quota as well as the total quota:

  • asia-northeast2: 20 CPUs
  • asia-northeast3: 20 CPUs
  • europe-north1: 20 CPUs
  • europe-west3: 20 CPUs
  • europe-west6: 20 CPUs
  • us-east4: 20 CPUs
  • us-west2: 20 CPUs
  • us-west3: 20 CPUs

The CPUs that you use when training a model are not counted as CPUs for Compute Engine, and the quota for AI Platform Training does not give you access to any Compute Engine VMs for other computing requirements. If you want to spin up a Compute Engine VM, you must request Compute Engine quota, as described in the Compute Engine documentation.

Limits on concurrent GPU usage for training

A typical project, when first using AI Platform Training, is limited to the following number of concurrent GPUs used in training ML models:

  • Total concurrent number of GPUs: This is the maximum number of GPUs in concurrent use, split per type as follows:

    • Concurrent number of A100 GPUs: 8
    • Concurrent number of K80 GPUs: 30
    • Concurrent number of P4 GPUs: 8
    • Concurrent number of P100 GPUs: 30
    • Concurrent number of V100 GPUs: 8
    • Concurrent number of T4 GPUs: 6

Certain regions have additional default quotas. When you use the following GPUs in the listed regions, they count toward the regional quotas as well as the total quota:

  • P4 GPUs in asia-southeast1: 4
  • P4 GPUs in us-east4: 1
  • P4 GPUs in us-west2: 1
  • T4 GPUs in asia-northeast3: 1
  • T4 GPUs in asia-southeast1: 4

Note that a project's quotas depend on multiple factors, so the quotas in a specific project may be lower than the numbers listed above. The GPUs that you use when training a model are not counted as GPUs for Compute Engine, and the quota for AI Platform Training does not give you access to any Compute Engine VMs using GPUs. If you want to spin up a Compute Engine VM using a GPU, you must request Compute Engine GPU quota, as described in the Compute Engine documentation.

If you need more GPUs for AI Platform Training, see the Requesting a quota increase section of this guide.

For more information about GPUs, see how to use GPUs to train models in the cloud.

Limits on concurrent TPU usage for training

As with GPUs, the TPU quota for AI Platform Training is separate from your Cloud TPU quota, which you can use directly with Compute Engine VMs. The TPUs that you use when training a model are not counted as TPUs for Compute Engine, and the quota for AI Platform Training does not give you access to any Compute Engine VMs using TPUs.

Google Cloud console only displays your Cloud TPU quota with Compute Engine. To request Cloud TPU quota for use with Compute Engine, submit a request to the Cloud TPU team.

All Google Cloud projects are allocated a default AI Platform Training quota for at least one Cloud TPU. Quota is allocated in units of 8 TPU cores per Cloud TPU. This quota is not displayed on Google Cloud console.

Requesting a quota increase

The quotas listed on this page are allocated per project, and may increase over time with use. If you need more processing capability, you can apply for a quota increase in one of the following ways:

  • Use the Google Cloud console to request increases for quotas that are listed in the API Manager for AI Platform Training:

    1. Find the section of the quota that you want to increase.

    2. Click the pencil icon next to the quota value at the bottom of the usage chart for that quota.

    3. Enter your requested increase:

      • If your desired quota value is within the range displayed on the quota limit dialog, enter your new value and click Save.

      • If you want to increase the quota beyond the maximum displayed, click Apply for higher quota and follow the instructions for the second way to request an increase.

  • If you want to increase a quota that isn't listed in the Google Cloud console, such as GPU quotas, use the AI Platform Quota Request form to request a quota increase. These requests are handled on a best-effort basis, which means there are no service-level agreements (SLAs) or service-level objectives (SLOs) involved in the review of these requests.

Limits on concurrent disk usage for training

The number of concurrent virtual disks for a typical project is scaled, based on the usage history of your project:

  • Total concurrent number of disks: Starting from 4,000 GB for standard hard disk drives (HDD) and 500 GB for solid-state drives (SSD), scaling to a typical value of 180,000 GB for HDD and 75,000 GB for SSD. These limits represent the combined maximum number of disks in concurrent use, including all machine types.

The disks that you use when training a model are not counted as disks for Compute Engine, and the quota for AI Platform Training does not give you access to any Compute Engine virtual machine instances (VMs) for other computing requirements. If you want to create a Compute Engine VM, then you must request Compute Engine quota

What's next