Google Cloud Platform

Easy, No-Fuss, Cost-Effective: Google Compute Engine Load Balancing In Action

How You Can Use Compute Engine Load Balancing to Unleash Your App’s Potential

Introduction

If you use Google products such as Search, AdWords, or Gmail, you will know that Google is committed to performance, scale, and robustness. Compute Engine Load Balancing is built on top of the same load balancing technology that is central to these products.

Compute Engine Load Balancing has the following advantages:

  • Easy to set up and maintain – Configure the load balancer via the command line interface or a programmatic RESTful API, then take load balancing off your mind, and let Google manage it
  • Load-balanced within a zone or across zones within a region – Make your app robust by exposing one IP for VMs running in more than one zone
  • Stable performance from the start – No pre-warming of the load balancing service is required
  • Able to route around unhealthy instances – Configure your health checks and let Compute Engine Load Balancing take care of the rest
  • Enabled with “lame-duck mode” – Easily upgrade your app without cutting off existing user connections, and be ready for maintenance windows: unhealthy backends will not receive new traffic, and existing connections are not interrupted
  • Affordable – Save yourself not only the headache of maintaining load balancing, but also decrease your costs as well

Load Balancing on Compute Engine

Load balancing is an essential component when scaling any service. There are various hardware and software load balancing options available. If you are running a service on Google Compute Engine, several of these options are open to you. The simplest approach, which does not require any software or hardware setup, is to use the Round-robin Domain Name System (DNS), a technique by which a single domain name is used to address multiple IP addresses. One significant drawback, however, is that clients, such as a web browser, may cache the DNS entry, and continue to send traffic to unhealthy servers.

Another approach is to use open source software such as HAProxy or Nginx. These open source projects may offer some additional features, but require both hardware and software setup and management. This may result in increased installation, configuration, and upgrade costs in addition to the introduction of a single point of failure.

Compute Engine Load Balancing helps eliminate these drawbacks and requirements by offering load balancing as a distributed service that is seamlessly integrated with Google Cloud Platform. Compute Engine Load Balancing runs as a service that is fully managed by Google. There is no need to run a load balancer on Compute Engine instances, and there is no need to worry about keeping load balancing up and running or routing around unhealthy backends. Compute Engine Load Balancing takes care of all of this.

Figure 1 illustrates the load balancing concepts in the context of Google’s Compute Engine Load Balancing. Users will send requests to the app running on Compute Engine via Compute Engine Load Balancing, avoiding those instances that are unhealthy.

Figure 1: Compute Engine Load Balancing concepts. User requests are sent through Compute Engine Load Balancing and are spread among healthy backends.

Benefits of Using Compute Engine Load Balancing

1. Easy to set up and maintain

Configuring your app to use Compute Engine Load Balancing is easy and can be done either via command line or via the Cloud Console UI. Either way, you start by having a number of instances running application logic on Compute Engine. As an example, we set up four instances with Apache web server backends (“www-0,” “www-1,” “www-2,” and “www-3”). Load balancing was enabled on these four instances with the following three commands (where $ZONE0 and $ZONE1 are zones in your $REGION):

gcutil --project=<your-project-id> addhttphealthcheck basic-check

gcutil --project=<your-project-id> addtargetpool www-pool --region=$REGION \
  --instances=$ZONE0/instances/www-0,$ZONE0/instances/www-1,\
  $ZONE1/instances/www-2,$ZONE1/instances/www-3 --health_checks="basic-check"

gcutil --project=<your-project-id> addforwardingrule www-rule \
  --region=$REGION --port_range=80 --target=www-pool

Each forwarding rule will expose an external IP address, which you can find by executing the following command:

gcutil --project=<your-project-id> listforwardingrules

That’s it! Clients will access the application with the externally facing IP and through Compute Engine Load Balancing. Setting up Compute Engine Load Balancing is as simple as that, and there is no need to maintain anything.

If you prefer a web interface, you can set up target pools, forwarding rules, and health checks with Google’s Cloud Console as shown in figure 2.

Figure 2: Setting up load balancing in the Cloud Console.

Both the command line and the UI allow users to set up more advanced features such as backup pools and session stickiness. For more information on setup options, please refer to the full documentation.

2. Load-balanced within a zone or across zones within a region

Compute Engine Load Balancing spreads load roughly evenly between all instances that it balances. Figure 3 demonstrates how evenly Compute Engine Load Balancing spread 500,000 requests to the externally-facing IP exposed by the load balancer. For any real-world system, load spreading becomes increasingly necessary as the load increases.

Figure 3: Load spread for a total of 500,000 requests sent in 100 parallel threads.[1]

3. Stable performance from the start

We now compare the average response time for querying backends directly to the response time for querying backends through Compute Engine Load Balancing.

Number of
requests

Average response time (in ms) for requests to one backend directly

Average response time (in ms) for requests through Compute Engine Load Balancing with one backend

Average response time (in ms) for requests through Compute Engine Load Balancing with four backends

Average response time (in ms) for requests through Compute Engine Load Balancing with four backends across two zones

1,000,000

100.9 (median: 100.0)

101.0 (median: 100.0)

98.89 (median: 98.00)

100.7 (median: 98.0)

Table 1: Response times for addressing a backend directly (100 parallel requests to a single instance) vs. through Compute Engine Load Balancing (100 parallel requests to one and four).[2]

As table 1 shows, Compute Engine Load Balancing introduces no perceptible latency compared to the latency that exists when addressing a backend directly. Furthermore, it scales to more than one backend without introducing additional latency. Note that we took great care to ensure that the results were isolated as much as possible from outside factors that can influence the response time. Factors such as congestion on the originating system and network latency between the originating network and Google’s network could significantly affect response times.

Compute Engine Load Balancing allows target pools to define a set of instances that span zones within a region. This is designed to make it easy to design robust systems. When one of the zones goes down, the load balancer will simply route around the instances in that zone, which at that point are considered unhealthy. In this section, we will take a look at the performance of the load balancer when instances are spread across zones. In our example, we have four instances, where two are in “us-central1-a” and two are in “us-central1-b.” The four instances are configured to be in the same target pool of a single forwarding rule. The rightmost column in table 1 shows how the average response time continues to be stable even when the load balancer spreads the responses among instances in two different zones.

Note that this stability across zones is also a very important feature for backup pools. Backup pools are designed to make an app more robust in the case of a partial or complete failure of a primary target pool, and they can be in a different zone from the primary pool. The stable response time shown above also implies that the user experience will not suffer when traffic needs to be sent to a backup pool of VMs.

Compute Engine Load Balancing shows stable performance right from the start. In this section, we again used 100 threads to send a total number of 1,000,000 requests. We show how the response time stays stable under this load in figure 4. The y-axis show the response time, and each dot on the graph represents a single request. Note how the response time is stable right from the beginning. Unlike many other solutions, with Compute Engine Load Balancing, there is no need to pre-warm the load balancer.

Figure 4: Response time for 1,000,000 requests (100 parallel threads) to four backends through Compute Engine Load Balancing.[3]

4. Able to route around unhealthy instances

Compute Engine Load Balancing seamlessly routes around unhealthy instances. Using the health checks you configure, the load balancer detects any unhealthy backends and routes traffic to the remaining backends. Figure 5 shows the load spread across instances before and after an instance goes down. ~500,000 requests were sent before and after backend “www-0” was taken out of rotation. Each data point in the graph represents the number of requests handled by one specific backend during a single minute. If the instance goes into “lame-duck mode” (described in more detail below), existing connections to that backend will still be handled until the backend is ready to be shut down, and no requests will time out.

While you focus on bringing your backend instances back to a healthy state, Google focuses on ensuring that the quality of your users’ experience is maintained.

Figure 5: Load spread when Compute Engine Load Balancing is handling an instance that becomes unhealthy.[4]

5. Enabled with “lame-duck mode”

Compute Engine Load Balancing supports “lame-duck mode,” which allows system administrators to take a VM out of the target pool while not interrupting the service. To take a VM out of rotation, configure it to respond negatively to health checks. New traffic will not be routed to that VM, and existing connections will not be interrupted. For the best user experience, the VM can then be monitored for existing connections before being shut down.

This feature is useful in various scenarios. One scenario where this feature is particularly useful is with application updates. When a new version of the application becomes available, VMs are often upgraded in a rolling fashion – one VM is taken out of rotation at a time, updated, and brought back into rotation. Many management tools such as Chef and Puppet can be configured to work this way.

Another use case where the “lame-duck mode” is useful is when there is a need to deal with maintenance windows effectively. As a maintenance window approaches, more VMs can be created (and added to the target pool) in a zone that will not go into a maintenance window. Those VMs that are about to be shut down can again be removed gracefully: if they respond to health check with “unhealthy” signals, existing connections will not be interrupted, but no new traffic will be sent to those VMs. The same concept also applies to cases where the application fails over to a backup target pool (because a sufficient portion of the VMs in the primary pool have become unhealthy). In that case, existing connections will be kept alive to VMs in the primary pool, while all new connections will be sent to the backup target pool.

The documentation contains additional details about the “lame-duck mode”. For any system that sets up Compute Engine Load Balancing, enabling the “lame-duck mode” is highly recommended in order to optimize the user experience.

6. Affordable

Compute Engine Load Balancing is very affordable. This can mean substantial cost savings when compared to running a set of VM instances for load balancing. For instance, if you run two instances on Compute Engine with load balancing software on n1-standard-2 instances (mid-tier general purpose instances, with one available for failover), you would currently pay $4029.60 per year for the instances alone, not counting network traffic costs. Compute Engine Load Balancing would be significantly more affordable – at the current pricing, Compute Engine Load Balancing would cost as little as $219 per year. This is about a 95% savings! Even if you choose to run smaller instances, your savings can be substantial (see table 2).

 

Two f1-micro instances dedicated to load balancing software

Two g1-small instances dedicated to load balancing software

Two n1-standard-2 instances dedicated to load balancing software

Compute Engine Load Balancing

Cost per year

$332.88

$946.08

$4029.60

$219

Table 2: Potential cost savings for using Compute Engine Load Balancing in a US region.[5]

Is Your App a Good Fit for Compute Engine Load Balancing?

Any application that requires scaling to more than one instance can benefit from Compute Engine Load Balancing. We describe two scenarios here to guide you through your application architecture design:

1. Web Application on Compute Engine with Logged-in Users

The first common scenario is a web application with logged-in users. One example is an app where users can log on, author content, and share this content with other users. Other users may simply log on and consume the content that has been produced by others. Figure 6 shows a sketch of an architecture for a typical use case.

Figure 6: Web application using Compute Engine Load Balancing.

Most web applications require that session information be stored. For our example, a user’s current session information may be any authored content that is not yet saved. A commonly used and robust architecture is one that stores session information server-side in the following ways:

  • Persistently in a database
  • Transiently in a high-speed cache for improved responsiveness

In the default setting, requests sent through Compute Engine Load Balancing from the same client IP are not sticky to any backend and are, in fact, spread over all instances. Compute Engine Load Balancing can be configured to be sticky[6] to a particular backend from any given client IP. This can further improve performance, as session information will not have to be retrieved frequently. The “lame-duck mode” of Compute Engine Load Balancing allows for seamless, rolling upgrades of the application software by taking instances out of rotation. Future versions of Compute Engine Load Balancing will further strengthen this use case as we will continue to expand its capabilities. It will include such features as SSL termination at the load-balancer and Layer-7 load balancing.

2. Intensive Processing on Compute Engine

The second scenario is an application that performs long-running or intensive processing in response to user requests (see figure 7). In this scenario, each user request requires some processing on one or more Compute Engine instances. The processing could involve Optical Character Recognition (OCR), video transcoding, a MapReduce job, and more.

Figure 7: Backend processing application using Compute Engine Load Balancing.

Compute Engine Load Balancing lends itself particularly well to this example if the requests are of similar cost and duration. The backend VMs do not carry session state, but instead perform a one-off task. The load balancer can pick any backend VM to perform each of these tasks, and load balancing can effectively spread the workload among the backends.

Conclusion

In this paper we explained the advantages that load balancing can bring to applications, including how easy it is to set up and configure and that it shows consistent performance from the beginning. We also explained how Compute Engine Load Balancing provides reduced maintenance and reduced costs when compared with those of 3rd party load balancers.

The first version of Compute Engine Load Balancing is a good fit for many use cases, such as web applications and intensive processing as described earlier. But this is only the beginning. In future releases, we plan to include such features as SSL termination and Layer-7 load balancing. These features will bring you additional benefits such as flexibility in balancing techniques. You can then create a highly scalable, load-balanced web site with very little administrative overhead.




[1] The results reported here are based on an experiment averaging over 500,000 requests. The experiment was done in a controlled testing environment set up within the Google network and is intended only to be an illustrative example.

[2] The latencies reported here are based on an experiment averaging over 1,000,000 requests. The experiment was done in a controlled testing environment set up within the Google network and is intended only to be an illustrative example.

[3] The latencies reported here are based on an experiment of 1,000,000 requests. The experiment was done in a controlled testing environment set up within the Google network and is intended only to be an illustrative example.

[4] The results reported here are based on an experiment of 1,000,000 requests. The experiment was done in a controlled testing environment set up within the Google network and is intended only to be an illustrative example.

[5] Please refer to the Compute Engine Pricing page for the most up-to-date information both for Compute Engine Load Balancing and for Compute Engine instance charges. The estimate for Compute Engine Load Balancing includes hourly charges at the US rate as of November 2013. Bandwith charges will be additional. During the promotional pricing period until January 2014, use of Compute Engine Load Balancing will be free of charge.

[6] Please see the documentation for further details.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.