Compute Engine offers load balancing and autoscaling for groups of instances.
Google Compute Engine offers server-side load balancing so you can distribute incoming network traffic across multiple virtual machine instances. Load balancing provides the following benefits with either network load balancing or HTTP(S) load balancing:
- Scale your application
- Support heavy traffic
- Detect and automatically remove unhealthy virtual machines instances. Instances that become healthy again are automatically re-added.
- Route traffic to the closest virtual machine
In addition, HTTP(S) load balancing also supports the following:
- Balance loads across regions
- Direct traffic to specific backends based on URLs
Google Compute Engine load balancing uses forwarding rule
resources to match certain types of traffic and forward it to a load balancer.
For example, a forwarding rule can match TCP traffic destined to port 80 on IP
192.0.2.1, then forward it to a load balancer, which then directs
it to healthy virtual machine instances.
Compute Engine load balancing is a managed service, which means its components are redundant and highly available. If a load balancing component fails, it is restarted or replaced automatically and immediately.
Google offers two types of load balancing that differ in capabilities, usage scenarios, and how you configure them. HTTP(S) load balancing is appropriate for HTTP and HTTPS traffic and can forward traffic across regions, to specific backends based on URL, or both. Network load balancing is appropriate for other types of traffic, but it is restricted to a single region.
HTTP(S) load balancing
Cross-region load balancing
You can use a global IP address that can intelligently route users based on proximity. For example, if you set up instances in North America, Europe, and Asia, users around the world will be automatically sent to the backends closest to them, assuming those instances have enough capacity. If the closest instances do not have enough capacity, cross-region load balancing automatically forwards users to the next closest region.
Content-based load balancing
Content-based or content-aware load balancing uses HTTP(S) load balancing to
distribute traffic to different instances based on the incoming URL. For
example, you can set up some instances to handle your video
content and another set to handle everything else. You can configure your load
balancer to direct traffic for
example.com/video to the video servers
example.com/ to the default servers.
Content-based and cross-region load-balancing can work together by using multiple backend services and multiple regions. You can build on top of the scenarios above to configure your own load balancing configuration that meets your needs.
Network load balancing
Assume that you are running a non-HTTP(S) service and you are starting to get a high enough level of traffic that you need to add additional instances to help respond to this load. You can add additional Google Compute Engine instances and configure load balancing to spread the load between these instances. In this situation, you would serve the same content from each of the instances. As your site becomes more popular, you would continue increasing the number of instances that are available to respond to requests.
Unlike HTTP(S) load balancing, Network load balancing cannot automatically forward traffic to a different region. If you wanted your service to function in different regions, you would have to set up a different IP address, load balancer, and set of instances for each region.
Compute Engine offers autoscaling to automatically add or remove virtual machines from an instance group based on increases or decreases in load. This allows your applications to gracefully handle increases in traffic and reduces cost when the need for resources is lower. You just define the autoscaling policy and the autoscaler performs automatic scaling based on the measured load.
Choose from a variety of policies that an autoscaler can use to scale your virtual machines. When you create an autoscaler, you must always specify a single policy for it; you cannot define more than one policy per autoscaler.
The following sections discuss each of these autoscaling policies in general; for more information about how to set up the specific autoscaling policy, see the respective policy documentation.
CPU utilization is the most basic autoscaling that you can perform. This policy tells the autoscaler to watch the average CPU utilization of a group of virtual machines and add or remove virtual machines from the group to maintain your desired utilization. This is useful for configurations that are CPU-intensive but might fluctuate in CPU usage.
For more information, see Scaling Based on CPU utilization.
HTTP(S) load balancing serving capacity
Set up an autoscaler to scale based on HTTP(S) load balancing serving capacity and the autoscaler will watch the serving capacity of an instance group, and scale if the virtual machines are over or under capacity.
The serving capacity of an instance can be defined in the load balancer's backend service and can be based on either utilization or requests per second.
For more information, see Scaling Based on HTTP(S) load balancing.
Stackdriver Monitoring metrics
If you export or use Stackdriver Monitoring metrics, you can set up autoscaling to collect data of a specific metric and perform scaling based on your desired utilization level. It is possible to scale based on standard metrics provided by Stackdriver Monitoring, or using any custom metrics you create as well.
For more information, see Scaling Based on Stackdriver Monitoring Metrics.