Autoscaled Internal Load Balancing using HAProxy and Consul on Compute Engine

This solution shows how to use Consul by HashiCorp , a DNS-based service-discovery product, as part of an HAProxy-based, internal load-balancing system on Google Cloud Platform.

An internal load balancer distributes network traffic to servers on a private network. Neither the internal load balancer nor the servers it distributes traffic to are exposed to the Internet. The Internal Load Balancing using HAProxy on Google Compute Engine tutorial provides a good explanation of the basics of internal load balancing and walks you through configuring an internal load balancer using HAProxy and Google Compute Engine.

This solution extends the previous configuration to support autoscaling on both the HAProxy load balancing tier and the backend server tier. This modification allows servers to be gracefully added to or removed from their respective tiers, letting the infrastructure respond to changes in load or recover from failure. Consul provides the service discovery and distributed configuration mechanisms to enable this load-balancing architecture:

The software stack

Technology Description
Consul Provides for service discovery. Servers register with Consul and can be discovered by other servers in the cluster.
Consul Template Provides a convenient way to populate values from Consul into the filesystem using the consul-template daemon.
Dnsmasq Forwards domain name services (DNS) queries from an instance to Consul. It allows applications that rely on DNS to easily discover services in Consul.
HAProxy A high-performance, TCP/HTTP load balancer.
Backend server A simple Go application that outputs a Compute Engine instance's local metadata as JSON.
Frontend server A simple Go application that consumes JSON from the backend server API and renders it as HTML.
Packer A tool for creating pre-configured Google Compute Engine VM images. Packer is used to install and configure Consul and HAProxy into a bootable image.
Cloud Platform Managed Instance Groups and Autoscaler A managed instance group is a pool of homogeneous instances, created from a common instance template. An autoscaler adds or remove instances from a managed instance group.

What you will learn

Each of the following sections discusses a specific aspect of the architecture diagram and includes hands-on instructions for provisioning that section on Compute Engine. By the end of the document, you will have learned about each section of the architecture in detail, and have it running and usable in your environment.

Before you begin

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. Install and initialize the Cloud SDK.

Be sure to run sudo gcloud components update if you are prompted to do so.

The Consul cluster

Consul provides for service registration and discovery in this architecture. When the Compute Engine instances running HAProxy start up, they register with a Consul service, named haproxy-internal, and the frontend server can discover all of the HAProxy servers with a DNS query to Consul. Similarly, instances running the backend application register with a Consul service named backend and can be discovered by the HAProxy servers.

To support service registration and discovery, you must run at least one Consul server. The makers of Consul strongly recommend running 3-5 Consul servers per datacenter. Examples in this document recommend running 3 Consul servers.

Hands-on: Launch Consul servers

  1. Create a Compute Engine instance named tool that has git and packer pre-installed:

    gcloud compute instances create tool \
      --scopes=cloud-platform \
      --zone=us-central1-f \
      --image=debian-8 \
      --metadata "startup-script=apt-get update -y && \
        sudo apt-get install -y git unzip && \
        curl -o /tmp/packer.zip https://releases.hashicorp.com/packer/0.8.6/packer_0.8.6_linux_amd64.zip && \
        curl -o /tmp/consul.zip https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip && \
        sudo unzip /tmp/packer.zip -d /usr/local/bin/ && \
        sudo unzip /tmp/consul.zip -d /usr/local/bin/"
    
  2. Wait a few minutes to give the scripts time to run before you SSH to the instance.

  3. Connect to the new tool instance:

    gcloud compute ssh tool --zone=us-central1-f
    
  4. Clone the source code repository to the tool instance:

    cd ~
    git clone https://github.com/GoogleCloudPlatform/compute-internal-loadbalancer.git
    
  5. Set an environment variable containing your project ID. Don't replace PROJECT_ID or project-id; all the resolution is handled for you:

    export PROJECT_ID=$(curl -H "metadata-flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/project-id)
    
  6. Change to the directory containing the Consul image files:

    cd ~/compute-internal-loadbalancer/images/consul
    
  7. Use packer to build the Google Compute Engine VM image for the Consul servers. Don't replace PROJECT_ID with the ID of your project, you already set that as a variable in a previous step.

    packer build -var project_id=${PROJECT_ID} packer.json
    
  8. Copy the ID of the image created. You can see the ID in the Builds finished message. For example, in the following message, the ID is consul-1450847630.

    ==> Builds finished. The artifacts of successful builds are:
    --> googlecompute: A disk image was created: consul-1450847630
    

    Make a note of this ID. You'll need it in the next step and later in this solution.

  9. Launch 3 Consul servers, being sure to replace the the value for the --image flag with the image ID you copied in the previous step:

    gcloud compute instances create consul-1 consul-2 consul-3 \
      --metadata="^|^consul_servers=consul-1,consul-2,consul-3" \
      --zone=us-central1-f \
      --no-address \
      --image=YOUR_CONSUL_IMAGE_ID
    
  10. Join your tool instance to the cluster:

    consul agent \
      -data-dir=/tmp/consul \
      -retry-join=consul-1 \
      -retry-join=consul-2 \
      -retry-join=consul-3 &
    
  11. After you see the status messages indicating that the instance joined the cluster successfully, enter clear to clear the console. You won't see a prompt at this point, just a blinking block cursor.

  12. View cluster members and verify that consul-1, consul-2, and consul-3 are joined as type server:

    consul members
    

    You'll see output like the following:

      Node          Address           Status  Type    Build  Protocol  DC
      consul-1      10.240.0.5:8301   alive   server  0.6.0  2         dc1
      consul-2      10.240.0.3:8301   alive   server  0.6.0  2         dc1
      consul-3      10.240.0.4:8301   alive   server  0.6.0  2         dc1
      tool          10.240.0.2:8301   alive   client  0.6.0  2         dc1
    

The backend application

The backend application in this example is a simple microservice that returns the instance metadata of the Compute Engine instance it is running on. The application returns the instance metadata as a JSON string in response to an HTTP GET request. Instances running the backend application have a private IP address but no public address. You can retrieve the source code for the sample application from GitHub.

Bootstrapping the backend

When a VM running the backend application comes online, it must join an existing Consul cluster, register itself as a service with the cluster, and then start the application process to respond to HTTP requests. Two systemd units—consul_servers.service and backend.service—are responsible for this bootstrapping.

  • consul_servers.service: This unit invokes the consul_servers.sh shell script, which retrieves the names of the Consul servers from the instance's metadata store and writes them to the file /etc/consul-servers. Consul servers must already be running, and the backend servers must be launched with a metadata attribute named consul_servers, whose value is a comma-delimited list of Consul server names. Here is the unit file:

    [Unit]
    Description=consul_servers
    
    [Service]
    Type=oneshot
    ExecStart=/bin/sh -c "/usr/bin/consul_servers.sh > /etc/consul-servers"
    
    [Install]
    WantedBy=multi-user.target
    
  • backend.service: This unit runs after consul_servers.service. It reads Consul server names from /etc/consul-servers, and then runs the backend-start.sh script. This script creates a Consul service file, then joins the cluster and registers the service, and finally starts the backend application.

    Here is the backend.service unit file:

    [Unit]
    Description=backend
    After=consul_servers.service
    Requires=consul_servers.service
    
    [Service]
    EnvironmentFile=/etc/consul-servers
    ExecStart=/usr/bin/backend-start.sh
    LimitNOFILE=9999999
    
    [Install]
    WantedBy=multi-user.target
    

    Here is the backend-start.sh script:

    #! /bin/bash
    export zone=$(curl -s -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/instance/zone" | grep -o [[:alnum:]-]*$)
    
    # Set zone in Consul and HAProxy files
    envsubst < "/etc/consul.d/backend.json.tpl" > "/etc/consul.d/backend.json"
    
    # Start consul
    consul agent -data-dir /tmp/consul -config-dir /etc/consul.d $CONSUL_SERVERS &
    
    # Start the microservice
    /opt/www/gceme -port=8080
    

The backend Consul service

Here is the /etc/consul.d/backend.json service file generated when the backend.service starts:

{
  "service": {
    "name": "backend",
    "tags": "us-central1-f",
    "port": 8080,
    "check": {
      "id": "backend",
      "name": "HTTP on port 8080",
      "http": "http://localhost:8080/healthz",
      "interval": "10s",
      "timeout": "1s"
    }
  }
}

When the Consul agent starts on a backend server, that server is registered with the Consul cluster as a backend service, and the availability zone it is running in is used as a tag. Members of the service can be discovered by resolving the DNS name backend.service.consul.. To find service members in a specific availability zone, prepend the tag to the name, such as us-central1-f.backend.service.consul..

Hands-on: Launch the backend service

In this hands-on section, you use Packer to build the VM image for the backend service, and then use an instance group to create and manage a cluster of backend servers.

  1. On your tool instance, change to the directory that contains the backend image files:

    cd ~/compute-internal-loadbalancer/images/backend
    
  2. Use packer to build the Compute Engine VM image for the Consul servers. Again, don't replace any part of this command line as it's using the current variable values:

    packer build -var project_id=${PROJECT_ID} packer.json
    
  3. Copy the ID of the image created, such as:

    ==> Builds finished. The artifacts of successful builds are:
    --> googlecompute: A disk image was created: backend-1450847630
    

    Make a note of this ID. You'll need it in the next step and later in this solution.

  4. Create an instance template that describes the configuration of the backend servers, being sure to replace YOUR_BACKEND_IMAGE_ID with the output of the previous step:

    gcloud compute instance-templates create backend \
      --no-address \
      --metadata="^|^consul_servers=consul-1,consul-2,consul-3" \
      --image=YOUR_BACKEND_IMAGE_ID
    
  5. Create an instance group that launches 2 backend servers using the backend template:

    gcloud compute instance-groups managed create backend \
      --base-instance-name=backend \
      --template=backend \
      --size=2 \
      --zone=us-central1-f
    

The HAProxy load balancer tier

HAProxy is used to load-balance requests to the backend servers. When a VM running HAProxy comes online, it must join an existing Consul cluster, register itself as a service with the cluster, discover the servers in the backend service, and then start HAProxy. Like the backend servers, the HAProxy servers have private IP addresses and are not accessible through the public Internet.

The HAProxy Consul service

The service registered with Consul by each HAProxy server is named haproxy-internal and is defined as follows:

{
  "service": {
    "name": "haproxy-internal",
    "tags": "us-central1-f",
    "port": 8080,
    "check": {
      "id": "haproxy",
      "name": "HTTP on port 8080",
      "http": "http://localhost:8080",
      "interval": "10s",
      "timeout": "1s"
    }
  }
}

Like the earlier backend service, the haproxy-internal service is tagged with the availability zone the instances are running in. This enables the frontend application to connect to any load balancer using the haproxy-internal.service.consul name, and will also enable zone-specific lookups by prepending a particular zone to the lookup name. For example us-central1-f.haproxy-internal.service.consul will only return service members in the us-central1-f availability zone.

Discovering servers in the backend service with consul-template

To load balance requests, you will create a group of HAProxy servers that will need to know the IP addresses of all healthy backend servers. This solution uses consul-template to update HAProxy's configuration file, named /etc/haproxy/haproxy.cfg, and reload the HAProxy service every time the membership of the backend service changes. This snippet of the haproxy.cfg template file shows the zone-specific us-central1-f.backend service being iterated and writing server directives that indicate available servers to HAProxy:

listen http-in
        bind *:8080{{range service "us-central1-f.backend"}}
        server {{.Node}} {{.Address}}:{{.Port}}{{end}}

consul-template is executed by systemd as follows:

consul-template -template "/etc/haproxy/haproxy.ctmpl:/etc/haproxy/haproxy.cfg:service haproxy restart" -retry 30s -max-stale 5s -wait 5s

See the consul_template.service unit file for more details.

Hands-on: Launch HAProxy load balancers

In this hands-on section, you use Packer to build the VM image for the HAProxy load balancer servers, and then use an instance group to create and manage a cluster of servers:

  1. On your tool instance, change to the directory that contains the HAProxy image files:

    cd ~/compute-internal-loadbalancer/images/haproxy
    
  2. Use packer to build the Compute Engine VM image. Use the command verbatim:

    packer build -var project_id=${PROJECT_ID} packer.json
    
  3. Copy the ID of the image created, such as:

    ==> Builds finished. The artifacts of successful builds are:
    --> googlecompute: A disk image was created: haproxy-1450847630
    

    Make a note of this ID. You'll need it in the next step and later in this solution.

  4. Create an instance template that describes the configuration of the backend servers, being sure to replace YOUR_HAPROXY_IMAGE_ID with the output of the previous step:

    gcloud compute instance-templates create haproxy \
      --no-address \
      --metadata="^|^consul_servers=consul-1,consul-2,consul-3" \
      --image=YOUR_HAPROXY_IMAGE_ID
    
  5. Create an instance group that launches 2 HAProxy servers:

    gcloud compute instance-groups managed create haproxy \
      --base-instance-name=haproxy \
      --template=haproxy \
      --size=2 \
      --zone=us-central1-f
    

The Frontend application

The frontend application in this example consumes the JSON output from the backend, through the HAProxy load balancers, and renders it as HTML:

Instances running the frontend application each have a public and private IP address. They can receive requests from the Internet and make requests to the HAProxy servers through private IP addresses. The source code for the sample application is available on GitHub.

Connecting the frontend to the backend

The frontend application accepts the location of the backend as a runtime flag:

gceme -frontend=true -backend-service=http://BACKEND_SERVICE_ADDRESS:PORT

The frontend servers are also members of the Consul cluster, so they can easily discover the HAProxy service by using DNS, and provide that service name as the value to the backend-service flag when the frontend process starts:

gceme -frontend=true -backend-service=http://us-central1-f.haproxy-internal.service.consul:8080

Consul service discovery with DNS and Dnsmasq

The Packer script installs dnsmasq on the frontend servers and /etc/resolv.conf is modified to include 127.0.0.1 as a name server, enabling dnsmasq to resolve queries for the .consul TLD. consul-template then renders a second hosts file, named /etc/hosts.consul, that contains the hostnames and addresses of load balancers in the HAProxy service. The consul-template file to generate /etc/hosts.consul is:

{{range service "$zone.haproxy-internal"}}
{{.Address}} $zone.{{.Name}} $zone.{{.Name}}.service.consul{{end}}

consul-template renders this file and restarts the dnsmasq service whenever HAProxy servers are added or removed from their instance group. A rendered /etc/hosts.consul looks like this:

10.240.0.8 us-central1-f.haproxy-internal us-central1-f.haproxy-internal.service.consul
10.240.0.9 us-central1-f.haproxy-internal us-central1-f.haproxy-internal.service.consul

Finally, dnsmasq is configured to use this additional hosts file and can answer resolution requests for haproxy-internal.service.consul from the frontend application. Complete details of the instance configuration are available in the images/frontend directory.

Hands-on: Launch the frontend application

In this hands-on section, you use Packer to build the VM image for the frontend servers, then launch a single frontend instance with a public IP address.

  1. On your tool instance, change to the directory that contains the frontend image files:

    cd ~/compute-internal-loadbalancer/images/frontend
    
  2. Run the following command to create a firewall rule to allow HTTP access to the frontend server:

    gcloud compute firewall-rules create frontend-http \
        --source-ranges=0.0.0.0/0 \
        --target-tags=frontend-http \
        --allow=TCP:80
    
  3. Use packer to build the Compute Engine instance image. Use the command verbatim:

    packer build -var project_id=${PROJECT_ID} packer.json
    
  4. Copy the ID of the image created, such as:

    ==> Builds finished. The artifacts of successful builds are:
    --> googlecompute: A disk image was created: frontend-1450847630
    

    Make a note of this ID. You'll need it in the next step and later in this solution.

  5. Create a frontend instance with a public IP address and the frontend-http tag that will open port 80. Be sure to replace YOUR_FRONTEND_IMAGE_ID with the output of the previous step:

    gcloud compute instances create frontend \
        --metadata="^|^consul_servers=consul-1,consul-2,consul-3" \
        --zone=us-central1-f \
        --tags=frontend-http \
        --image=YOUR_FRONTEND_IMAGE_ID
    
  6. The details of the instance will be output when the create operation succeeds. The output will look similar to the following example:

    NAME     ZONE          MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP EXTERNAL_IP   STATUS
    frontend us-central1-f n1-standard-1             10.240.0.10 104.197.14.97 RUNNING
    
  7. From your terminal window, copy the value for EXTERNAL_IP and open it in your browser to view the frontend application.

  8. Refresh the page several times and notice that different backends are serving the request.

Autoscale and handle failure

When you launch an instance using the HAProxy image you made, the instance automatically registers itself with Consul and is discoverable by frontend instances through DNS. Similarly, if an HAProxy instance fails, Consul detects the failure, frontends are notified by consul-template restarting dnsmasq, and requests are no longer routed through the failed instance. Because these instances bootstrap, are discoverable, and fail gracefully, it is a simple task to add a Compute Engine Autoscaler to automatically add or remove load balancing capacity based on utilization.

Hands-on: Simulate an HAProxy load balancer failure

The HAProxy instance group you created previously had 2 instances in the group. You can resize the group to 1 instance to simulate the failure of one of the instances, as follows:

gcloud compute instance-groups managed resize haproxy --size=1

The instance group manager will choose a random instance to shut down. A notification should appear from Consul in your tool terminal indicating a failure:

2015/12/27 20:00:43 [INFO] memberlist: Suspect haproxy-gg94 has failed, no acks received
2015/12/27 20:00:44 [INFO] serf: EventMemberFailed: haproxy-gg94 10.240.0.8

The failed instance is no longer a member of the service, and the frontend servers are updated to exclude it.

You can run consul members to view the HAProxy instance that was terminated. You should see output similar to this:

$ consul members
Node          Address           Status  Type    Build  Protocol  DC
backend-lfwe  10.240.0.6:8301   alive   client  0.6.0  2         dc1
backend-mq5c  10.240.0.7:8301   alive   client  0.6.0  2         dc1
consul-1      10.240.0.5:8301   alive   server  0.6.0  2         dc1
consul-2      10.240.0.3:8301   alive   server  0.6.0  2         dc1
consul-3      10.240.0.4:8301   alive   server  0.6.0  2         dc1
frontend      10.240.0.10:8301  alive   client  0.6.0  2         dc1
haproxy-gg94  10.240.0.8:8301   failed  client  0.6.0  2         dc1
haproxy-tm64  10.240.0.9:8301   alive   client  0.6.0  2         dc1
tool          10.240.0.2:8301   alive   client  0.6.0  2         dc1

Hands-on: Enable HAProxy autoscaling

You can enable autoscaling for an existing instance group. The autoscaler supports scaling based on CPU utilization, custom metrics, or both. In this example, you configure the autoscaler to add capacity when either the aggregate CPU utilization of the HAProxy servers is more than 70% or when the network input rate exceeds 1 gigabit per second.

Enable autoscaling for the HAProxy server group by using the following command:

gcloud compute instance-groups managed set-autoscaling haproxy \
  --zone=us-central1-f \
  --min-num-replicas=2 \
  --max-num-replicas=8 \
  --scale-based-on-cpu \
  --target-cpu-utilization=.7 \
  --custom-metric-utilization="metric=compute.googleapis.com/instance/network/received_bytes_count,utilization-target=1000000000,utilization-target-type=DELTA_PER_SECOND"

Multi-zone deployment with Cloud Deployment Manager

Google Cloud Deployment Manager is an infrastructure management service that automates the creation and management of your Cloud Platform resources, freeing you up to focus on developing services and applications for your users. You can use Cloud Deployment Manager to run the entire solution in multiple availability zones in just a few steps:

  1. From your tool instance, edit the file ~/compute-internal-loadbalancer/dm/config.yaml. Insert the image IDs for the images you created with Packer in the previous steps:

    ...
    haproxy_image: REPLACE_WITH_YOUR_IMAGE_ID
    backend_image: REPLACE_WITH_YOUR_IMAGE_ID
    frontend_image: REPLACE_WITH_YOUR_IMAGE_ID
    consul_image: REPLACE_WITH_YOUR_IMAGE_ID
    ...
    

    If you don't have the list of images handy, from your terminal window that's connected to your tool instance, run gcloud compute images list.

  2. Deploy the entire architecture by using gcloud:

    gcloud deployment-manager deployments create internal-lb-demo \
        --config=$HOME/compute-internal-loadbalancer/dm/config.yaml
    
  3. As the deployment is processing, view the Cloud Deployment Manager console page to track the progress.

  4. When the deployment completes, open the HTTP load balancer console page and then click the IP address in the Incoming Traffic column to access a frontend and verify the application is working. It can take several minutes after the deployment completes before the load balancer begins to handle requests.

Cleaning up

After you've finished the HAProxy with Consul tutorial, you can clean up the resources you created on Google Cloud Platform so you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Deleting the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

To delete the project:

  1. In the Cloud Platform Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting instance groups

To delete a Compute Engine instance group:

Deleting instances

To delete a Compute Engine instance:

  1. In the Cloud Platform Console, go to the VM Instances page.

    Go to the VM Instances page

  2. Click the checkbox next to the instance you want to delete.
  3. Click the Delete button at the top of the page to delete the instance.

Deleting disks

To delete a Compute Engine disk:

  1. In the Cloud Platform Console, go to the Disks page.

    Go to the Disks page

  2. Click the checkbox next to the disk you want to delete.
  3. Click the Delete button at the top of the page to delete the disk.

Deleting images

To delete a Compute Engine disk image:

  1. In the Cloud Platform Console, go to the VM Instances page.

    Go to the Images page

  2. Click the checkbox next to the image you want to delete.
  3. Click the Delete button at the top of the page to delete the image.

Delete the Cloud Deployment Manager deployment

To delete the deployment, run the following command:

gcloud deployment-manager deployments delete internal-lb-demo

Next Steps

You've now seen how to create a scalable, resilient, internal load balancing solution by using HAProxy and Consul on Compute Engine instances, that runs in a private network. You've also seen how service discovery works to enable registration and discovery of both the load-balancing tier and its clients.

You can extend or modify this solution to support different backend applications, such as database servers or key-value stores.

For information about how to load test your internal load balancer, see Distributed Load Testing Using Kubernetes.

Read about other load balancing solutions available on Google Cloud Platform:

Send feedback about...