Quotas and limits

This document lists the quotas and limits that apply to Compute Engine.

A quota restricts how much of a particular shared Google Cloud resource your Cloud project can use, including hardware, software, and network components.

Quotas are part of a system that does the following:

  • Monitors your use or consumption of Google Cloud products and services.
  • Restricts your consumption of those resources for reasons including ensuring fairness and reducing spikes in usage.
  • Maintains configurations that automatically enforce prescribed restrictions.
  • Provides a means to make or request changes to the quota.

When a quota is exceeded, in most cases, the system immediately blocks access to the relevant Google resource, and the task that you're trying to perform fails. In most cases, quotas apply to each Cloud project and are shared across all applications and IP addresses that use that Cloud project.

Compute Engine enforces quotas on resource usage for various reasons. For example, quotas help to protect the community of Google Cloud users by preventing unforeseen spikes in usage. Google Cloud also offers free trial quotas that provide limited access for projects to help you explore Google Cloud on a free trial basis.

Not all projects have the same quotas. As you increasingly use Google Cloud over time, your quotas might increase accordingly. If you expect a notable upcoming increase in usage, you can proactively request quota adjustments from the Quotas page in the console.

For information specific to quotas for rate limits for the Compute Engine API, see API rate limits.

Permissions for checking and editing quota

To view your quotas, you must have the serviceusage.quotas.get permission.

To change your quotas, you must have the serviceusage.quotas.update permission.

These permissions are included by default in the basic IAM roles of Owner and Editor and in the predefined Quota Administrator role.

Check your quota

Regional quotas are not a subset of project quotas. Virtual machine (VM) instances are a part of regional quotas.

If you're looking for regional quotas, such as how many VMs you can create in a region, see Check regional quota. To check your project quota, use the Google Cloud console or the Google Cloud CLI.

For information about quota categories, see Understanding quotas.

Check regional quota

Console

In the Google Cloud console, go to the Quotas page.

Go to Quotas

gcloud

List quotas in a region:

gcloud compute regions describe REGION

Replace REGION with the name of the region for which you want a list of quota information.

Check project quota

Console

In the Google Cloud console, go to the Quotas page.

Go to Quotas

gcloud

Check project-wide quotas:

gcloud compute project-info describe --project PROJECT_ID

Replace PROJECT_ID with your project ID.

Request an increase in quota

There is no charge for requesting a quota increase. Your costs increase only if you use more resources.

A request to decrease quota is rejected by default. If you must reduce your quota, reply to the support email with an explanation of your requirements. A support representative from the Compute Engine team will respond to your request within 24 to 48 hours.

Plan and request additional resources at least a few days in advance to ensure that there is enough time to fulfill your request.

For detailed instructions on how to increase quota from the Google Cloud console, see Requesting a higher quota limit.

Quotas and resource availability

Resource quotas are the maximum number of resources you can create of that resource type, if those resources are available. Quotas do not guarantee that resources are always available. If a resource is not available, or if the region you choose is out of the resource, you can't create new resources of that type, even if you have remaining quota in your region or project. For example, you might still have quota to create external IP addresses in us-central1, but there might not be available IP addresses in that region.

Similarly, even if you have a regional quota, a resource might not be available in a specific zone. For example, you might have quota to create VM instances in region us-central1, but you might not be able to create VM instances in the zone us-central1-a if the zone is depleted. In such cases, try creating the same resource in another zone, such as us-central1-f. To learn more about your options if zonal resources are depleted, see the documentation for troubleshooting resource availability.

Resource quotas

When planning your VM instance needs, you should consider several quotas that affect how many VM instances you can create.

Regional and global quotas

VM quotas are managed at the regional level. VM instance, instance group, disk quotas, and CPU can be consumed by any VM in the region, regardless of zone. For example, CPU quota is a regional quota, so there is a different limit and usage count for each region. To launch an n2-standard-16 instance in any zone in the us-central1 region, you need enough quota for at least 16 CPUs in us-central1.

Networking and load balancing quotas are required to create firewalls, load balancers, networks, and VPNs. These quotas are global quotas that do not depend on a region. Any region can use a global quota. For example, in-use and static external IP addresses assigned to load balancers and HTTP(S) proxies consume global quotas.

VM instances

The VM instances quota is a regional quota and limits the number of VM instances that can exist in a given region, regardless of whether the VM is running. This quota is visible in the Google Cloud console on the Quotas page. Compute Engine automatically sets this quota to be 10 times your regular CPU quota. You do not need to request this quota. If you need quota for more VM instances, request more CPUs because having more CPUs increases VM instance quota. The quota applies to both running and non-running VMs, and to normal and preemptible instances.

  1. In the Google Cloud console, go to the Quotas page.

    Go to Quotas

  2. Click Filter table and select Service.

  3. Choose Compute Engine API.

  4. Choose Limit Name: VM instances.

  5. To see a list of your VM instance quotas by region, click All Quotas. Your region quotas are listed from highest to lowest usage.

  6. Click the checkbox of the region whose quota you want to change.

  7. Click Edit Quotas.

  8. Complete the form.

  9. Click Submit Request.

Instance groups

To use instance groups, you must have available quota for all the resources that the group uses (for example, CPU quota) and available quota for the group resource itself. Depending on the type of group that you create, the following group resource quotas apply:

Service type Service quota
Regional (multi-zone) managed instance group Regional instance group managers
Zonal (single-zone) managed instance group Both of:
  • Instance group managers
  • Instance groups
Unmanaged (single-zone) instance group Instance groups
Regional (multi-zone) autoscaler Regional autoscalers
Zonal (single-zone) autoscaler Autoscalers

Disk quotas

The following persistent disk and local SSD quotas apply on a per-region basis:

  • Local SSD (GB). This quota is the total combined size of local SSD disk partitions that can be attached to VMs in a region. Local SSD is a fast, ephemeral disk that should be used for scratch, local cache, or processing jobs with high fault tolerance because the disk is not intended to survive VM instance reboots. Local SSD partitions are sold in increments of 375 GB and up to 24 local SSD partitions can be attached to a single VM. In the gcloud CLI and the API, this quota is referred to as LOCAL_SSD_TOTAL_GB.
  • Persistent disk standard (GB). This quota is the total size of standard persistent disks that can be created in a region. As described in Optimizing persistent disk and local SSD performance, standard persistent disks offer lower IOPS and throughput than SSD persistent disks or local SSD. It is cost effective when used as large durable disks for storage, as boot disks, and for serial write processes like logs. Standard persistent disks are durable and are available indefinitely to attach to a VM within the same zone. In the gcloud CLI and the API, this quota is referred to as DISKS_TOTAL_GB. This quota also applies to regional standard persistent disks, but regional disks consume twice the amount of quota per GB due to replication in two zones within a region.
  • Persistent disk SSD (GB). This quota is the total combined size of SSD-backed persistent disks partitions that can be created in a region. SSD-backed persistent disks have multiple replicas and, as described in Block storage performance, offer higher IOPS and throughput than standard persistent disks. SSD-backed persistent disks are available indefinitely to attach to a VM within the same zone. In the gcloud CLI and the API, this quota is referred to as SSD_TOTAL_GB. This quota is separate from local SSD. This quota applies to the disk types listed below. Regional persistent disks consume twice the amount of quota per GB due to replication in two zones within a region:
    • Zonal and regional SSD persistent disk
    • Zonal and regional balanced persistent disk

CPU quota

CPU quota is the total number of virtual CPUs across all of your VM instances in a region. CPU quotas apply to running VMs and VM reservations. Both predefined and preemptible VMs consume this quota.

To help protect Compute Engine systems and other users, some new accounts and projects also have a global CPUs (All Regions) quota. That quota applies to all regions and is measured as a sum of all your vCPUs in all regions.

For example, if you have 48 vCPUs remaining in a single region such as us-central1 but only 32 vCPUs remaining for the CPUs (All Regions) quota, you can launch only 32 vCPUs in the us-central1 region, even though there is remaining quota in the region. This is because you reach the CPU (All Regions) quota and need to delete existing instances before you can launch new instances.

E2 and N1 machine types share a CPU quota pool. N2, N2D, M1, M2, and C2 machine types have unique, separate CPU quota pools.

If you are using committed use discounts for your VMs, you must have committed use discount quota before you purchase a committed use discount contract.

Machine type Quota pool CPU quota name Committed CPU quota name
E2, N1 shared pool CPUS Committed_CPUS
N2 separate pool N2_CPUS Committed_N2_CPUS
N2D separate pool N2D_CPUS Committed_N2D_CPUS
T2D separate pool T2D_CPUS Committed_T2D_CPUS
T2A (Preview) separate pool T2A_CPUS Not available (N/A) for Committed_T2A_CPUS
M1 separate pool M1_CPUS Committed_MEMORY-OPTIMIZED_CPUS
M2 separate pool M2_CPUS Committed_MEMORY-OPTIMIZED_CPUS
C2 separate pool C2_CPUS Committed_C2_CPUS
C2D separate pool C2D_CPUS Committed_C2D_CPUS
A2 separate pool A2_CPUS Committed_A2_CPUS
Preemptible VMs shared pool PREEMPTIBLE_CPUS Not available (N/A) for preemptible VMs

GPU quota

Similar to virtual CPU quota, GPU quota refers to the total number of virtual GPUs in all VM instances in a region. GPU quotas apply to running VMs and VM reservations. Both predefined and preemptible VMs consume this quota.

Check the Quotas page to ensure that you have enough GPUs available in your project, and to request a quota increase. In addition, new accounts and projects have a global GPU quota that applies to all regions.

When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones. Request preemptible GPU quota to use those resources.

NVIDIA GPU quota name Committed GPU quota name Virtual workstation Preemptible GPUs Preemptible GPU virtual workstation
K80 NVIDIA_K80_GPUS COMMITTED_NVIDIA_K80_GPUS N/A PREEMPTIBLE_NVIDIA_K80_GPUS N/A
P100 NVIDIA_P100_GPUS COMMITTED_NVIDIA_P100_GPUS NVIDIA_P100_VWS_GPUS PREEMPTIBLE_NVIDIA_P100_GPUS PREEMPTIBLE_NVIDIA_P100_VWS_GPUS
A100 NVIDIA_A100_GPUS COMMITTED_NVIDIA_A100_GPUS N/A PREEMPTIBLE_NVIDIA_A100_GPUS N/A
P4 NVIDIA_P4_GPUS COMMITTED_NVIDIA_P4_GPUS NVIDIA_P4_VWS_GPUS PREEMPTIBLE_NVIDIA_P4_GPUS PREEMPTIBLE_NVIDIA_P4_VWS_GPUS
T4 NVIDIA_T4_GPUS COMMITTED_NVIDIA_T4_GPUS NVIDIA_T4_VWS_GPUS PREEMPTIBLE_NVIDIA_T4_GPUS PREEMPTIBLE_NVIDIA_T4_VWS_GPUS
V100 NVIDIA_V100_GPUS COMMITTED_NVIDIA_V100_GPUS N/A PREEMPTIBLE_NVIDIA_V100_GPUS N/A

Quotas for preemptible resources

To use preemptible CPUs or GPUs attached to preemptible VM instances, or to use local SSDs attached to preemptible VM instances, you must have available quota in your project for those respective resources.

You can request special preemptible quotas for Preemptible CPUs, Preemptible GPUs, or Preemptible Local SSDs (GB). However, if your project does not have preemptible quota, and you have never requested preemptible quota, you can consume standard quota to launch preemptible resources.

After Compute Engine grants you preemptible quota in a region, all preemptible instances automatically count against preemptible quota. As this quota is depleted, you must request preemptible quota for those resources.

External IP addresses

You must have enough external IP addresses for every VM that needs to be directly reachable from the public internet. Regional IP quota is for assigning IPv4 addresses to VMs in that region. Global IP quota is for assigning IPv4 addresses to global networking resources such as load balancers. Google Cloud offers different types of IP addresses, depending on your needs. For information about costs, refer to External IP address pricing. For information about quota specifics, see Quotas and limits.

  • In-use external IP addresses. Includes both ephemeral and static IP addresses that are currently being used by a resource.

  • Static External IP addresses: External IP addresses reserved for your resources that persist through machine restarts. You can register these addresses with DNS and domain provider services to provide a user-friendly address. For example, www.example-site.com.

  • Static Internal IP addresses: Static internal IP addresses let you reserve internal IP addresses from the internal IP range configured in the subnet. You can assign those reserved internal addresses to resources as needed.

API rate limits

API rate limits (also known as API quotas) define the number of requests that can be made to the Compute Engine API. These rate limits apply on a per-project basis. Each rate limit corresponds to all the requests for a group of one or more Compute Engine API methods. When you use gcloud compute or the Google Cloud console, you are also making requests to the API and these requests count towards your rate limits. If you use service accounts to access the API, that also counts towards your rate limit.

API rate limits are enforced and automatically refilled in 60-second (1-minute) intervals. That means if your project reaches a rate limit's maximum anytime within 60 seconds, you need to wait for that quota to refill before making more requests in that group. If your project exceeds a rate limit, you receive a 403 error with the reason rateLimitExceeded. To resolve this error, wait a minute then try your request again—the quota should be refilled at the start of the next interval.

Currently, requests are limited using the following groups. Each group is counted separately, so you can achieve the maximum limit in each group simultaneously.

The following rate limit groups apply to all resources unless specified otherwise:

Limit group Description Default limit
Queries
  • Default limit for mutation methods.
  • Metric: compute.googleapis.com/default
Rate per project (defaultPerMinutePerProject): 1500 requests/minute
Read requests
  • Limit for *.get methods.
  • Metric: compute.googleapis.com/read_requests
Rate per project (ReadRequestsPerMinutePerProject): 1500 requests/minute
List requests
  • Limit for *.list methods.
  • Metric: compute.googleapis.com/list_requests
Rate per project (ListRequestsPerMinutePerProject): 1500 requests/minute
Operation read requests
  • Limit for globalOperations.get, regionOperations.get, and zoneOperations.get methods.
  • Metric: compute.googleapis.com/operation_read_requests
Rate per project (OperationReadRequestsPerMinutePerProject): 1500 requests/minute
Global resource mutation requests
  • Limit for disks.createSnapshot, images.delete, images.deprecate, images.insert, images.setLabels, snapshots.delete, snapshots.insert, snapshots.setLabels, machineImages.insert, and machineImages.delete methods.
  • Metric: compute.googleapis.com/global_resource_write_requests
Rate per project (GlobalResourceWriteRequestsPerMinutePerProject): 375 requests/minute
Heavy-weight mutation requests
  • Limit for patch, delete, and insert methods for the interconnects and interconnectAttachments resources.
  • Metric: compute.googleapis.com/heavy_weight_write_requests
Rate per project (HeavyWeightWriteRequestsPerMinutePerProject): 750 requests/minute
Heavy-weight read requests
  • Limit for methods such as Operations.wait, *.getEffectiveFirewalls, and *.aggregatedList.
  • Metric: compute.googleapis.com/heavy_weight_read_requests
Rate per project (HeavyWeightReadRequestsPerMinutePerProject): 750 requests/minute

The following rate limit groups apply to APIs with per method limits:

Limit group Description Default limit
Instance simulate maintenance event requests
  • Limit for instances.simulateMaintenanceEvent method.
  • Metric: compute.googleapis.com/simulate_maintenance_event_requests
Rate per project (SimulateMaintenanceEventRequestsPerDayPerProject): 150 requests/minute
Instance list referrer requests
  • Limit for instances.listReferrers method.
  • Metric: compute.googleapis.com/instance_list_referrers_requests
Rate per project (InstanceListReferrersRequestsPerMinutePerProject): 3000 requests/minute
Instance get serial port output requests
  • Limit for instances.getSerialPortOutput method.
  • Metric: compute.googleapis.com/get_serial_port_output_requests
Rate per project (GetSerialPortOutputRequestsPerMinutePerProject): maximum of 1500 requests/minute)
License insert requests
  • Limits for licenses.insert method.
  • Metric: compute.googleapis.com/license_insert_requests
  • Rate per project (LicenseInsertRequestsPerMinutePerProject): 2.5 requests/second (150 requests/minute)
  • Rate per day per project (LicenseInsertRequestsPerDayPerProject): 30 requests/day
Project set common instance metadata requests
  • Limit for projects.setCommonInstanceMetadata method.
  • Metric: compute.googleapis.com/project_set_common_instance_metadata_requests
Rate per project (ProjectSetCommonInstanceMetadataRequestsPerMinutePerProject): 36 requests/minute
Recommend location requests
  • Limit for regionInstances.recommendLocations method.
  • Metric: compute.googleapis.com/recommend_locations_requests
Rate per project (RecommendLocationsRequestsPerMinutePerProject): 20 requests/minute
Network endpoint write requests
  • Limit for *.AttachNetworkEndpoints and *.DetachNetworkEndpoints methods.
  • Metric: compute.googleapis.com/network_endpoint_write_requests
Rate per project (NetworkEndpointWriteRequestsPerMinutePerProject): 1500 requests/minute
Network endpoint list requests
  • Limit for networkEndpointGroups.listNetworkEndpoints method.
  • Metric: compute.googleapis.com/network_endpoint_list_requests
Rate per project (NetworkEndpointListRequestsPerMinutePerProject): 1500 requests/minute

Follow the Compute Engine API best practices for preserving API rate limits to mitigate the effects of API rate limits.

If you need a higher rate limit for API requests, you can review the current use and request an increase in the API quota. For instructions on how to increase quota from the Google Cloud console, see Requesting a higher quota limit.

Concurrent operation limits

Concurrent operation limits define the number of in-flight or concurrent operations at any point of time. Any API request that creates, modifies, or deletes a Compute Engine resource is subject to a concurrent operation limit check to see if a new operation can be created at that point of time.

If your project exceeds the concurrent operation limit for any in-flight operation, you receive a 403 error with the reason rateLimitExceeded.

Operation groups and limits

This section describes the limits for various Compute Engine in-flight or concurrent operations.

Global operations and limits

Concurrent global operations consume a global limit that is specified for a project. The following table lists the global limits for in-flight operations:

Operation Description Limit
All global methods Limits the total number of concurrent global operations for a project. 8000 in-flight operations per project
routes.insert Limits the number of concurrent route creations in a project. 200 in-flight route creations per project
routes.delete Limits the number of concurrent route delete operations in a project. 400 in-flight delete route operations per project
firewalls.insert Limits the number of concurrent firewall creations in a project. 400 in-flight create firewall operations per project
firewalls.delete Limits the number of concurrent firewall deletions in a project. 400 in-flight delete firewall operations per project
snapshots.insert Limits the number of concurrent snapshot creations in a project. 8000 in-flight create snapshot operations per project
snapshots.delete Limits the number of concurrent snapshot deletions in a project. 4000 in-flight delete snapshot operations per project

Regional and zonal operation limits

The following limits apply to the specified operations for a project in a region and its zones:

Operation Description Limit
All regional methods Limits the total number of concurrent operations for a project in a region and its zones. 8000 in-flight operations per project per region.
instances.insert Limits the number of concurrent instance creation operations for a project in a region. 1200 in-flight instance insert operations per project per region
instances.delete Limits the number of concurrent instance delete operations for a project in a region. 1200 in-flight instance delete operations per project per region
instances.bulkInsert Limits the number of concurrent bulk creations of instances for a project in a region. 20 in-flight bulk instance insert operations per project per region
disks.insert Limits the number of concurrent disk creations for a project in a region. 1500 in-flight create disk operations per project per region

Best practices

The following checklist summarizes the best practices for reducing insufficient concurrent operation limit errors:

What's next