Scaling an Application

When you deploy an application in Kubernetes Engine, you can define how many replicas of the application you'd like to run. When you scale an application, you increase or decrease the number of replicas.

Each replica of your application represents a Kubernetes Pod that encapsulates your application's container(s).

Inspecting an application

First, run kubectl get [CONTROLLER] to see your deployed applications. Substitute [CONTROLLER] for deployments, statefulsets, or another controller object type.

For example, if you run kubectl get deployments and you have created only one Deployment, the command's output should look similar to the following:

NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                1         1         1            1           10m

The output of this command is similar for all objects, but may appear slightly different. For Deployments, the output has six columns:

  • NAME lists the names of the Deployments in the cluster.
  • DESIRED displays the desired number of replicas of the application, which you define when you create the Deployment. This is the desired state.
  • CURRENT displays how many replicas are currently running.
  • UP-TO-DATE displays the number of replicas that have been updated to achieve the desired state.
  • AVAILABLE displays how many replicas of the application are available to your users.
  • AGE displays the amount of time that the application has been running.

In this example, there is only one Deployment, my-app. my-app has only one replica because its desired state is one replica. You define the desired state at the time of creation, and you can change it at any time.

Inspecting StatefulSets

Before scaling a StatefulSet, you should run kubectl describe statefulset my-app to get more information about it.

In the output of this command, check the Pods Status field. If the Failed value is greater than 0, scaling might fail.

If a StatefulSet appears to be unhealthy, run kubectl get pods to see which replicas are unhealthy. Then, run kubectl delete [POD], where [POD] is the name of the unhealthy Pod.

Attempting to scale a StatefulSet while it is unhealthy may cause it to become unavailable.

Scaling

The following sections describe each method you can use to scale an application. The kubectl scale method is the fastest way to scale. However, you may prefer another method in some situations, like when updating configuration files or when performing in-place modifications.

Console

To scale a workload, perform the following steps:

  1. Visit the Kubernetes Engine Workloads menu in GCP Console.

    Visit the Workloads menu

  2. Select the desired workload from the menu.

  3. Click Actions, then Scale.
  4. From the Replicas field, enter the desired number of replicas.
  5. Click Scale.

kubectl scale

kubectl scale lets your instantaneously change the number of replicas you want to run your application.

To use kubectl scale, you specify the new number of replicas by setting the --replicas flag. For example, to scale my-app to four replicas, run the following command, substituting [CONTROLLER] for deployment, statefulset, or another controller object type:

kubectl scale [CONTROLLER] my-app --replicas=4

If successful, this command's output should be similar to deployment "my-app" scaled.

Next, run kubectl get [CONTROLLER]/my-app. The output should look similar to the following:

NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                4         4         4            4           15m

kubectl apply

You can use kubectl apply to apply a new configuration file to an existing controller object. kubectl apply is useful for making multiple changes to a resource, and may be useful for users who prefer to manage their resources in configuration files.

To scale using kubectl apply, the configuration file you supply should include a new number of replicas in the replica field for the object spec.

The following is an updated version of the configuration file for the example my-app object:

apiVersion: apps/v1beta2
kind: [CONTROLLER]
metadata:
  name: my-app
spec:
  replicas: 4
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: ...
        image: ...
        ports:
        - containerPort: 80

In this file, the value of the replicas field is 4. [CONTROLLER] could be Deployment, StatefulSet or another controller. When this configuration file is applied, the object my-app scales to four replicas.

Run the following command:

kubectl apply -f config.yaml

Next, run kubectl get [CONTROLLER] my-app. The output should look similar to the following:

NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                4         4         4            4           15m

kubectl edit

kubectl edit allows in-place edits of the object directly from your shell or terminal window. It opens the editor defined by your KUBE_EDITOR or EDITOR environment variables, or, if these aren't set, falls back to the vi command-line in Linux or notepad in Windows. The command accepts filenames as well as command-line arguments, although the files you point to must be previously saved versions of resources.

Run the following command:

kubectl edit [CONTROLLER] my-app

This command causes the defined or default editor to open the object's live configuration as it appears in the cluster.

Edit the value of the file's replicas field, then save the file.

To see its rollout status, run the following command:

kubectl rollout status [CONTROLLER] my-app

kubectl patch

kubectl patch performs in-place edits in the object's configuration file using a JSON-formatted string as a command-line argument.

First, to see the object's configuration, run the following command:

kubectl get [CONTROLLER] my-app -o yaml

The output looks similar to this:

apiVersion: apps/v1beta2
kind: [CONTROLLER]
metadata:
  name: my-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: ...
        image: ...
        ports:
        - containerPort: 80

The replicas field needs to be updated.

Run the following command:

kubectl patch [CONTROLLER] my-app -p '{"spec":{"replicas":4}}'

This command replaces the value found in the file with the value provided.

You can also use a file containing the same changes:

kubectl patch [CONTROLLER] my-app -p=changes.json

Autoscaling Deployments

You can autoscale Deployments using kubectl autoscale or from the Kubernetes Engine Workloads menu in GCP Console.

Console

To autoscale a Deployment, perform the following steps:

  1. Visit the Kubernetes Engine Workloads menu in GCP Console.

    Visit the Workloads menu

  2. Select the desired workload from the menu.

  3. Click Actions, then Autoscale.
  4. Fill the Maximum number of pods field with the desired maximum number of Pods.
  5. Optionally, fill the Minimum number of pods and Target CPU utilization in percent fields with the desired values.
  6. Click Autoscale.

kubectl autoscale

kubectl autoscale creates a HorizontalPodAutoscaler (or HPA) object that targets a specified resource (called the scale target) and scales it as needed. The HPA periodically adjusts the number of replicas of the scale target to match the average CPU utilization that you specify.

You can use kubectl autoscale with the following controller object types:

  • Deployments
  • ReplicaSets
  • ReplicationControllers

When you use kubectl autoscale, you can specify a maximum and minimum number of replicas for your application, as well as a CPU utilization target. For example, to set the maximum number of replicas to six and the minimum to four, with a CPU utilization target of 50% utilization, run the following command:

kubectl autoscale deployment my-app --max=6 --min=4 --cpu-percent=50

In this command, the --max flag is required. The --cpu-percent flag is the target CPU utilization over all the Pods. This command does not immediately scale the Deployment to six replicas, unless there is already a systemic demand.

After running kubectl autoscale, the HorizontalPodAutoscaler object is created and targets the application. When there a change in load, the object increases or decreases the application's replicas.

To see a specific HorizontalPodAutoscaler object in your cluster, run:

kubectl get hpa [HPA_NAME]

To see the HorizontalPodAutoscaler configuration:

kubectl get hpa [HPA_NAME] -o yaml

The output of this command is similar to the following:

apiVersion: v1
items:
- apiVersion: autoscaling/v1
  kind: HorizontalPodAutoscaler
  metadata:
    creationTimestamp: ...
    name: [HPA_NAME]
    namespace: default
    resourceVersion: "664"
    selfLink: ...
    uid: ...
  spec:
    maxReplicas: 10
    minReplicas: 1
    scaleTargetRef:
      apiVersion: extensions/v1beta2
      kind: Deployment
      name: [HPA_NAME]
    targetCPUUtilizationPercentage: 50
  status:
    currentReplicas: 0
    desiredReplicas: 0
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""

In this example, the targetCPUUtilizationPercentage field holds the 50 percentage value passed in from the kubectl autoscale example.

To see a detailed description of a specific HorizontalPodAutoscaler object in the cluster:

kubectl describe hpa [HPA_NAME]

You can modify the HorizontalPodAutoscaler by applying a new configuration file with kubectl apply, using kubectl edit, or using kubectl patch.

To delete a HorizontalPodAutoscaler object:

kubectl delete hpa [HPA_NAME]

What's next

Send feedback about...

Kubernetes Engine