Load balancing and scaling

Google Cloud offers load balancing and autoscaling for groups of instances.

Load balancing

Google Cloud offers server-side load balancing so you can distribute incoming traffic across multiple virtual machine (VM) instances. Load balancing provides the following benefits:

  • Scale your app
  • Support heavy traffic
  • Detect and automatically remove unhealthy VM instances using health checks. Instances that become healthy again are automatically re-added.
  • Route traffic to the closest virtual machine

Google Cloud load balancing uses forwarding rule resources to match certain types of traffic and forward it to a load balancer. For example, a forwarding rule can match TCP traffic destined to port 80 on IP address, then forward it to a load balancer, which then directs it to healthy VM instances.

Google Cloud load balancing is a managed service, which means its components are redundant and highly available. If a load balancing component fails, it is restarted or replaced automatically and immediately.

Google Cloud offers several different types of load balancing that differ in capabilities, usage scenarios, and how you configure them. See Google Cloud load balancing documentation for descriptions.


Compute Engine offers autoscaling to automatically add or remove VM instances from a managed instance group (MIG) based on increases or decreases in load. Autoscaling lets your apps gracefully handle increases in traffic, and it reduces cost when the need for resources is lower. You can autoscale a MIG based on its CPU utilization, Cloud Monitoring metrics, schedules, or load balancing serving capacity.

When you set up an autoscaler to scale based on load balancing serving capacity, the autoscaler watches the serving capacity of an instance group and scales when the VM instances are over or under capacity. The serving capacity of an instance can be defined in the load balancer's backend service and can be based on either utilization or requests per second. For more information, see Scaling based on load balancing serving capacity.

To learn more about autoscaling, see Autoscaling groups of instances.

What's next