Load Balancing and Scaling

Google Cloud Platform (GCP) offers load balancing and autoscaling for groups of instances.

Load Balancing

GCP offers server-side load balancing so you can distribute incoming traffic across multiple virtual machine instances. Load balancing provides the following benefits:

  • Scale your application
  • Support heavy traffic
  • Detect and automatically remove unhealthy virtual machine instances using health checks. Instances that become healthy again are automatically re-added.
  • Route traffic to the closest virtual machine

GCP load balancing uses forwarding rule resources to match certain types of traffic and forward it to a load balancer. For example, a forwarding rule can match TCP traffic destined to port 80 on IP address, then forward it to a load balancer, which then directs it to healthy virtual machine instances.

GCP load balancing is a managed service, which means its components are redundant and highly available. If a load balancing component fails, it is restarted or replaced automatically and immediately.

GCP offers several different types of load balancing that differ in capabilities, usage scenarios, and how you configure them. See Load balancing for descriptions.


Compute Engine offers autoscaling to automatically add or remove virtual machines from an instance group based on increases or decreases in load. This allows your applications to gracefully handle increases in traffic and reduces cost when the need for resources is lower. You just define the autoscaling policy and the autoscaler performs automatic scaling based on the measured load.


Choose from a variety of policies that an autoscaler can use to scale your virtual machines. When you create an autoscaler, you must specify at least one policy. If you use multiple policies, the autoscaler will scale an instance group based on the policy that provides the largest number of virtual machines in the group.

The following sections discuss the autoscaling policies in general; for more information about how to set up a specific autoscaling policy, see the respective policy documentation.

CPU utilization

CPU utilization is the most basic autoscaling that you can perform. This policy tells the autoscaler to watch the average CPU utilization of a group of virtual machines and add or remove virtual machines from the group to maintain your desired utilization. This is useful for configurations that are CPU-intensive but might fluctuate in CPU usage.

For more information, see Scaling Based on CPU utilization.

Load balancing serving capacity

Set up an autoscaler to scale based on load balancing serving capacity and the autoscaler will watch the serving capacity of an instance group, and scale if the virtual machines are over or under capacity.

The serving capacity of an instance can be defined in the load balancer's backend service and can be based on either utilization or requests per second.

For more information, see Scaling Based on HTTP(S) load balancing.

Stackdriver Monitoring metrics

If you export or use Stackdriver Monitoring metrics, you can set up autoscaling to collect data of a specific metric and perform scaling based on your desired utilization level. It is possible to scale based on standard metrics provided by Stackdriver Monitoring, or using any custom metrics you create as well.

For more information, see Scaling Based on Stackdriver Monitoring Metrics.

Was this page helpful? Let us know how we did:

Send feedback about...

Compute Engine Documentation