Distributed Load Testing Using Kubernetes

Load testing is key to the development of any backend infrastructure because load tests demonstrate how well the system functions when faced with real-world demands. An important aspect of load testing is the proper simulation of user and device behavior to identify and understand any possible system bottlenecks, well in advance of deploying applications to production.

However, dedicated test infrastructure can be expensive and difficult to maintain because it is not needed on a continuous basis. Moreover, dedicated test infrastructure is often a one-time capital expense with a fixed capacity, which makes it difficult to scale load testing beyond the initial investment and can limit experimentation. This can lead to slowdowns in productivity for development teams and lead to applications that are not properly tested before production deployments.

Solution overview

Distributed load testing using cloud computing is an attractive option for a variety of test scenarios. Cloud platforms provide a high degree of infrastructure elasticity, making it easy to test applications and services with large numbers of simulated clients, each generating traffic patterned after users or devices. Additionally, the pricing model of cloud computing fits very well with the very elastic nature of load testing.

Containers, which offer a lightweight alternative to running full virtual machine instances for applications, are well-suited for rapid scaling of simulated clients. Containers are an excellent abstraction for running test clients because they are lightweight, simple to deploy, immediately available, and well-suited to singular tasks.

Google Cloud Platform is an excellent environment for distributed load testing using containers. The platform supports containers as first-class citizens via Google Container Engine, which is powered by the open source container-cluster manager, Kubernetes. Container Engine provides the ability to quickly provision container infrastructure and tools to manage deployed applications and resources.

This solution demonstrates how to use Container Engine to deploy a distributed load testing framework. The framework uses multiple containers to create load testing traffic for a simple REST-based API. Although this solution tests a simple web application, the same pattern can be used to create more complex load testing scenarios such as gaming or Internet-of-Things (IoT) applications. This solution discusses the general architecture of a container-based load testing framework. For a tutorial with step-by-step instructions on setting up a sample framework, see the Tutorial section at the end of this document.

This solution focuses on the use of Container Engine to create load testing traffic. The system under test is a simple web application that uses a REST API. The solution leverages an existing load testing framework to model the API interactions that is discussed in more detail in the following sections. After deploying the system under test, the solution uses Container Engine to deploy the distributed load testing tasks.

System under test

In software testing terminology, the system under test is the system that your tests are designed to evaluate. In this solution, the system under test is a small web application deployed to Google App Engine. The application exposes basic REST-style endpoints to capture incoming HTTP POST requests (incoming data is not persisted). In a real-world scenario, web applications can be complex and include a multitude of additional components and services such as caching, messaging, and persistence. These complexities are outside the scope of this solution. For more information on building scalable web applications on Google Cloud Platform, please refer to the Building Scalable and Resilient Web Applications solution.

The source code for the sample application is available as part of the tutorial at the end of this document.

Example workloads

The example application is modeled after the backend service component found in many Internet-of-Things (IoT) deployments—devices first register with the service and then begin reporting metrics or sensor readings, while also periodically re-registering with the service.

The following diagram shows a common backend service component interaction.

A diagram showing a common backend service component interaction.

To model this interaction, you can use Locust, a distributed, Python-based load testing tool that is capable of distributing requests across multiple target paths. For example, Locust can distribute requests to the /login and /metrics target paths. There are many load generation software packages available, including JMeter, Gatling, and Tsung—any one of which might better suit your project's needs.

The workload is based on the interaction described above and is modeled as a set of Tasks in Locust. To approximate real-world clients, each Locust task is weighted. For example, registration happens once per thousand total client requests.

Container-based computing

From an architectural perspective, there are two main components involved in deploying this distributed load testing solution: the Locust container image and the container orchestration and management mechanism.

The Locust container image is a Docker image that contains the Locust software. The Dockerfile can be found in the associated Github repository (refer to the tutorial below). The Dockerfile uses a base Python image and includes scripts to start the Locust service and execute the tasks.

This solution uses Google Container Engine as the container orchestration and management mechanism. Container Engine, which is based on the open source framework Kubernetes, is the product of years of experience running, orchestrating, and managing container deployments all across Google. Container-based computing allows developers to focus on their applications, instead of on deployments and integrations into hosting environments. Containers also facilitate load testing portability so that containerized applications can run across multiple cloud environments. Container Engine and Kubernetes introduce several concepts specific to container orchestration and management.

Container clusters

A container cluster is a group of Compute Engine instances that provides the foundation for your entire application. The Container Engine and Kubernetes documentation refer to these instances as nodes. A cluster comprises a single master node and one or more worker nodes. The master and workers all run on Kubernetes, which is why container clusters are sometimes called Kubernetes clusters. For more information about clusters, see the Container Engine documentation.


A pod is a tightly-coupled group of containers that should be deployed together. Some pods contain only a single container. For example, in this solution, each of the Locust containers runs in its own pod. Often, however, pods contain multiple containers that work together in some way. For example, in this solution, Kubernetes uses a pod with three containers to provide DNS services. In one container, SkyDNS provides DNS server functionality. SkyDNS relies on a key-value store, named etcd, that resides in another container. In the pod's third container, kube2sky acts as a bridge between Kubernetes and SkyDNS.

Replication controllers

A replication controller ensures that a specified number of pod "replicas" are running at any one time. If there are too many, the replication controller kills some pods. If there are too few, it starts more. This solution has three replication controllers: one ensures the existence of a single DNS server pod; another maintains a single Locust master pod; and a third keeps exactly 10 Locust worker pods running.


A particular pod can disappear for a variety of reasons, including node failure or intentional node disruption for updates or maintenance. This means that the IP address of a pod does not provide a reliable interface for that pod. A more reliable approach would use an abstract representation of that interface that never changes, even if the underlying pod disappears and is replaced by a new pod with a different IP address. A Container Engine service provides this type of abstract interface by defining a logical set of pods and a policy for accessing them. In this solution, there are several services that represent pods or sets of pods. For example, there is a service for the DNS server pod, another service for the Locust master pod, and a service that represents all 10 Locust worker pods.

The following diagram shows the contents of the master and worker nodes.

A diagram showing contents of the master and worker nodes.

Deploying system under test

The solution uses Google App Engine to run the system under test. To deploy the system under test, you need an active Google Cloud Platform account so that you can install and run the Google Cloud Platform SDK. With the SDK in place, you can deploy the sample web application with a single command. The source code for the web application is available as part of the tutorial at the end of this document.

Deploying load testing tasks

To deploy the load testing tasks, you first deploy a load testing master and then deploy a group of ten load testing workers. With this many load testing workers, you can create a substantial amount of traffic for testing purposes. Keep in mind, however, that generating excessive amounts of traffic to external systems can resemble a denial-of-service attack. Be sure to review the Google Cloud Platform Terms of Service and the Google Cloud Platform Acceptable Use Policy.

The load testing master

The first component of the deployment is the Locust master, which is the entry point for executing the load testing tasks described above. The Locust master is deployed as a replication controller with a single replica because we need only one master. A replication controller is useful even when deploying a single pod because it ensures high availability.

The configuration for the replication controller specifies several elements, including the name of the controller (locust-master), labels for organization (name: locust, role: master), and the ports that need to be exposed by the container (8089 for web interface, 5557 and 5558 for communicating with workers). This information is later used to configure the Locust workers controller. The following snippet contains the configuration for the ports:

  - name: locust-master-web
    containerPort: 8089
    protocol: TCP
  - name: locust-master-port-1
    containerPort: 5557
    protocol: TCP
  - name: locust-master-port-2
    containerPort: 5558
    protocol: TCP

Next, we deploy a Service to ensure that the exposed ports are accessible to other pods via hostname:port within the cluster, and referenceable via a descriptive port name. The use of a service allows the Locust workers to easily discover and reliably communicate with the master, even if the master fails and is replaced with a new pod by the replication controller. The Locust master service also includes a directive to create an external forwarding rule at the cluster level, which provides the ability for external traffic to access the cluster resources. Note that you must still create firewall rules to provide complete access to target instances.

After you deploy the Locust master, you can access the web interface using the public IP address of the external forwarding rule. After you deploy the Locust workers, you can start the simulation and look at aggregate statistics through the Locust web interface.

The load testing workers

The next component of the deployment includes the Locust workers, which execute the load testing tasks described above. The Locust workers are deployed by a single replication controller that creates ten pods. The pods are spread out across the Kubernetes cluster. Each pod uses environment variables to control important configuration information such as the hostname of the system under test and the hostname of the Locust master. The configuration of the worker’s replication controller can be found in the tutorial below. The configuration contains the name of the controller, locust-worker, labels for organization, name: locust, role: worker, and the previously described environment variables. The following snippet contains the configuration for the name, labels, and number of replicas:

kind: ReplicationController
  apiVersion: v1
    name: locust-worker
      name: locust
      role: worker
    replicas: 10
      name: locust
      role: worker

For the Locust workers, no additional service needs to be deployed because the worker pods themselves do not need to support any inbound communication—they connect directly to the Locust master pod.

The following diagram shows the relationship between the Locust master and the Locust workers.

A diagram showing kubernetes pods with Locust master and worker nodes.

After the replication controller deploys the Locust workers, you can return to the Locust master web interface and see that the number of slaves corresponds to the number of deployed workers.

Executing load testing tasks

Starting the load testing

The Locust master web interface enables you to execute the load testing tasks against the system under test, as shown in the following image:

A screenshot of the initial Locust web interface.

To begin, specify the total number of users to simulate and a rate at which each user should be spawned. Next, click Start swarming to begin the simulation. As time progress and users are spawned, you will see statistics begin to aggregate for simulation metrics, such as the number of requests and requests per second, as shown in the following image:

A screenshot of the Locust web interface while swarming is in progress.

To stop the simulation, click Stop and the test will terminate. The complete results can be downloaded into a spreadsheet.

Scaling clients

Scaling up the number of simulated users will require an increase in the number of Locust worker pods. As specified in the Locust worker controller, the replication controller deploys 10 Locust worker pods. To increase the number of pods deployed by the replication controller, Kubernetes offers the ability to resize controllers without redeploying them. For example, you can change the number of worker pods by using the kubectl command line tool. The following command scales the pool of Locust worker pods to 20:

$ kubectl scale --replicas=20 replicationcontrollers locust-worker

After you issue the scale command, wait a few minutes for all of the pods to be deployed and started. After all the pods have started, return to the Locust master web interface and restart the load testing.

Resources and cost

This solution uses four Container Engine nodes, each one backed by a Compute Engine VM standard instance of type n1-standard-1. You can use the Google Cloud Platform Pricing Calculator to get an estimate of the monthly cost of running this container cluster. As discussed previously, you can customize the size of the container cluster to scale it to your needs. The pricing calculator lets you customize the cluster characteristics to get an estimate of what the cost would be to scale up or down.

Next steps

You've now seen how to use Container Engine to create a load testing framework for a simple web application. Container Engine allows you to specify the number of container nodes that provide the foundation for your load testing framework. Container Engine also allows you to organize your load testing workers into pods, and to declare how many pods that you want Container Engine to keep running.

You can use this same pattern to create load testing frameworks for a variety of different scenarios and applications. For example, you can use this pattern to create load testing frameworks for messaging systems, data stream management systems, and database systems. You can create new Locust tasks or even switch to a different load testing framework.

Another way to extend the framework presented in this solution is to customize the metrics that are collected. For example, you might want to measure the requests per second, or monitor the response latency as load increases, or check the response failure rates and types of errors. There are several monitoring options available, including Google Cloud Monitoring.


The complete contents of the tutorial, including instructions and source code, are available on GitHub at https://github.com/GoogleCloudPlatform/distributed-load-testing-using-kubernetes.

Send feedback about...