Manage capacity and quota

Last reviewed 2023-08-21 UTC

This document in the Google Cloud Architecture Framework shows you how to evaluate and plan your capacity and quota on the cloud.

In conventional data centers, you typically spend cycles each quarter reviewing current resource requirements and forecasting future ones. You must consider physical, logistical, and human-resource-related concerns. Concerns like rack space, cooling, electricity, bandwidth, cabling, procurement times, shipping times, and how many engineers are available to rack and stack new equipment need to be considered. You also have to actively manage capacity and workload distributions so that resource-intensive jobs, such as Hadoop pipelines, don't interfere with services, such as web servers, that must be highly available.

In contrast, when you use Google Cloud you cede most capacity planning to Google. Using the cloud means you don't have to provision and maintain idle resources when they aren't needed. For example, you can create, scale up, and scale down VM instances as needed. Because you pay for what you use, you can optimize your spending, including excess capacity that you only need at peak traffic times. To help you save, Compute Engine provides machine type recommendations if it detects that you have underutilized VM instances that can be resized or deleted.

Evaluate your cloud capacity requirements

To manage your capacity effectively, you need to know your organization's capacity requirements.

To evaluate your capacity requirements, start by identifying your top cloud workloads. Evaluate the average and peak utilizations of these workloads, and their current and future capacity needs.

Identify the teams who use these top workloads. Work with them to establish an internal demand-planning process. Use this process to understand their current and forecasted cloud resource needs.

Analyze load pattern and call distribution. Use factors like last 30 days peak, hourly peak, and peak per minute in your analysis.

Consider using Cloud Monitoring to get visibility into the performance, uptime, and overall health of your applications and infrastructure.

View your infrastructure utilization metrics

To make capacity planning easier, gather and store historical data about your organization's use of cloud resources.

Ensure you have visibility into infrastructure utilization metrics. For example, for top workloads, evaluate the following:

  • Average and peak utilization
  • Spikes in usage patterns
  • Seasonal spikes based on business requirements, such as holiday periods for retailers
  • How much over-provisioning is needed to prepare for peak events and rapidly handle potential traffic spikes

Ensure your organization has set up alerts to automatically notify of when you get close to quota and capacity limitations.

Use Google's monitoring tools to get insights on application usage and capacity. For example, you can define custom metrics with Monitoring. Use these custom metrics to define alerting trends. Monitoring also provides flexible dashboards and rich visualization tools to help identify emergent issues.

Create a process for capacity planning

Establish a process for capacity planning and document this plan.

As you create this plan do the following:

  1. Run load tests to determine how much load the system can handle while meeting its latency targets, given a fixed amount of resources. Load tests should use a mix of request types that matches production traffic profiles from live users. Don't use a uniform or random mix of operations. Include spikes in usage in your traffic profile.
  2. Create a capacity model. A capacity model is a set of formulas for calculating incremental resources needed per unit increase in service load, as determined from load testing.
  3. Forecast future traffic and account for growth. See the article Measure Future Load for a summary of how Google builds traffic forecasts.
  4. Apply the capacity model to the forecast to determine future resource needs.
  5. Estimate the cost of resources your organization needs. Then, get budget approval from your Finance organization. This step is essential because the business can choose to make cost versus risk tradeoffs across a range of products. Those tradeoffs can mean acquiring capacity that's lower or higher than the predicted need for a given product based on business priorities.
  6. Work with your cloud provider to get the correct amount of resources at the correct time with quotas and reservations. Involve infrastructure teams for capacity planning and have operations create capacity plans with confidence intervals.
  7. Repeat the previous steps every quarter or two.

For more detailed guidance on the process of planning capacity while also optimizing resource usage, see Capacity Planning.

Ensure your quotas match your capacity requirements

Google Cloud uses quotas to restrict how much of a particular shared Google Cloud resource that you can use. Each quota represents a specific countable resource, such as API calls to a particular service, the number of load balancers used concurrently by your project, or the number of projects that you can create. For example, quotas ensure that a few customers or projects can't monopolize CPU cores in a particular region or zone.

As you review your quota, consider these details:

  • Plan the capacity requirements of your projects in advance to prevent unexpected limiting of your resource consumption.
  • Set up your quota and capacity to handle full region failure.
  • Use quotas to cap the consumption of a particular resource. For example, you can set a maximum query usage per day quota over the BigQuery API to ensure that a project doesn't overspend on BigQuery.
  • Plan for spikes in usage and include these spikes as part of your quota planning. Spikes in usage can be expected fluctuations throughout the day, unexpected peak traffic events, or known peak traffic and launch events. For details about how to plan for peak traffic and launch events, read the next section in Operational Excellence: Plan for peak traffic and launch events.

If your current quotas aren't sufficient, you can manage your quota using the Google Cloud console. If you require a large capacity, contact your Google Cloud sales team. However, you should know that many services also have limits that are unrelated to the quota system, see Working with quotas for more information.

Regularly review your quotas. Submit quota requests before they're needed. Read Working with quotas for details about how quota requests are evaluated and how requests are approved or denied.

There are several ways to view and manage your Google Cloud quota:

What's next