This page explains how to scale a deployed application in Google Kubernetes Engine.
Overview
When you deploy an application in GKE, you define how many replicas of the application you'd like to run. When you scale an application, you increase or decrease the number of replicas.
Each replica of your application represents a Kubernetes Pod that encapsulates your application's container(s).
Before you begin
Before you start, make sure you have performed the following tasks:
- Ensure that you have enabled the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- Ensure that you have installed the Cloud SDK.
Set up default gcloud
settings using one of the following methods:
- Using
gcloud init
, if you want to be walked through setting defaults. - Using
gcloud config
, to individually set your project ID, zone, and region.
Using gcloud init
If you receive the error One of [--zone, --region] must be supplied: Please specify
location
, complete this section.
-
Run
gcloud init
and follow the directions:gcloud init
If you are using SSH on a remote server, use the
--console-only
flag to prevent the command from launching a browser:gcloud init --console-only
-
Follow the instructions to authorize
gcloud
to use your Google Cloud account. - Create a new configuration or select an existing one.
- Choose a Google Cloud project.
- Choose a default Compute Engine zone for zonal clusters or a region for regional or Autopilot clusters.
Using gcloud config
- Set your default project ID:
gcloud config set project PROJECT_ID
- If you are working with zonal clusters, set your default compute zone:
gcloud config set compute/zone COMPUTE_ZONE
- If you are working with Autopilot or regional clusters, set your default compute region:
gcloud config set compute/region COMPUTE_REGION
- Update
gcloud
to the latest version:gcloud components update
Inspecting an application
Before scaling your application, you should inspect the application and ensure that it is healthy.
To see all applications deployed to your cluster, run the following command:
kubectl get controller
Substitute controller for deployments
, statefulsets
, or another
controller object type.
For example, if you run kubectl get deployments
and you have created only one
Deployment, the command's output should look similar to the following:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
my-app 1 1 1 1 10m
The output of this command is similar for all objects, but may appear slightly different. For Deployments, the output has six columns:
NAME
lists the names of the Deployments in the cluster.DESIRED
displays the desired number of replicas, or the desired state, of the application, which you define when you create the Deployment.CURRENT
displays how many replicas are currently running.UP-TO-DATE
displays the number of replicas that have been updated to achieve the desired state.AVAILABLE
displays how many replicas of the application are available to your users.AGE
displays the amount of time that the application has been running in the cluster.
In this example, there is only one Deployment, my-app
, which has only one
replica because its desired state is one replica. You define the desired state
at the time of creation, and you can change it at any time by scaling the
application.
Inspecting StatefulSets
Before scaling a StatefulSet, you should inspect it by running the following command:
kubectl describe statefulset my-app
In the output of this command, check the Pods Status field. If the Failed
value is greater than 0
, scaling might fail.
If a StatefulSet appears to be unhealthy, perform the following:
Get a list of pods, and see which pods are unhealthy:
kubectl get pods
Remove the unhealthy pod:
kubectl delete pod-name
Attempting to scale a StatefulSet while it is unhealthy may cause it to become unavailable.
Scaling an application
The following sections describe each method you can use to scale an application.
The kubectl scale
method is the fastest way to scale. However, you may prefer
another method in some situations, like when updating configuration files or
when performing in-place modifications.
kubectl scale
The kubectl scale
command lets your instantaneously change the number of replicas you want to
run your application.
To use kubectl scale
, you specify the new number of replicas by setting the
--replicas
flag. For example, to scale my-app
to four replicas, run the
following command, substituting controller for deployment
,
statefulset
, or another controller object type:
kubectl scale controller my-app --replicas 4
If successful, this command's output should be similar to deployment
"my-app" scaled
.
Next, run:
kubectl get controller my-app
The output should look similar to the following:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
my-app 4 4 4 4 15m
kubectl apply
You can use kubectl apply
to apply a new configuration file to an existing controller object. kubectl
apply
is useful for making multiple changes to a resource, and may be
useful for users who prefer to manage their resources in configuration files.
To scale using kubectl apply
, the configuration file you supply should
include a new number of replicas in the replicas
field of the object's
specification.
The following is an updated version of the configuration file for the
example my-app
object. The example shows a Deployment, so if you
use another type of controller, such as a StatefulSet, change the kind
accordingly. This example works best on a cluster with at least three Nodes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: my-container
image: gcr.io/google-samples/hello-app:2.0
In this file, the value of the replicas
field is 3
. When this
configuration file is applied, the object my-app
scales to three replicas.
To apply an updated configuration file, run the following command:
kubectl apply -f config.yaml
Next, run:
kubectl get controller my-app
The output should look similar to the following:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
my-app 3 3 3 3 15m
Console
To scale a workload in Google Cloud Console, perform the following steps:
Visit the Google Kubernetes Engine Workloads menu in Cloud Console.
In the workloads list, click the name of the workload you want to scale.
Click fullscreen Scale, or click list Actions > Scale.
Enter the new number of Replicas for the workload.
Click Scale.
Autoscaling Deployments
You can autoscale Deployments based on CPU utilization of Pods using kubectl
autoscale
or from the GKE Workloads menu in
Cloud Console.
kubectl autoscale
kubectl autoscale
creates a HorizontalPodAutoscaler
(or HPA) object that targets a specified
resource (called the scale target) and scales it as needed. The HPA
periodically adjusts the number of replicas of the scale target to match the
average CPU utilization that you specify.
When you use kubectl autoscale
, you specify a maximum and minimum number
of replicas for your application, as well as a CPU utilization target. For
example, to set the maximum number of replicas to six and the minimum to
four, with a CPU utilization target of 50% utilization, run the following
command:
kubectl autoscale deployment my-app --max 6 --min 4 --cpu-percent 50
In this command, the --max
flag is required. The --cpu-percent
flag is the
target CPU utilization over all the Pods. This command does not
immediately scale the Deployment to six replicas, unless there is already a
systemic demand.
After running kubectl autoscale
, the HorizontalPodAutoscaler
object is
created and targets the application. When there is a change in load, the object
increases or decreases the application's replicas.
To get a list of the HorizontalPodAutoscaler
objects in your cluster, run:
kubectl get hpa
To see a specific HorizontalPodAutoscaler
object in your cluster, run:
kubectl get hpa hpa-name
where hpa-name is the name of your HorizontalPodAutoscaler
object.
To see the HorizontalPodAutoscaler
configuration:
kubectl get hpa hpa-name -o yaml
The output of this command is similar to the following:
apiVersion: v1
items:
- apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
creationTimestamp: ...
name: hpa-name
namespace: default
resourceVersion: "664"
selfLink: ...
uid: ...
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-name
targetCPUUtilizationPercentage: 50
status:
currentReplicas: 0
desiredReplicas: 0
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""
In this example output, the targetCPUUtilizationPercentage
field holds the
50
percentage value passed in from the kubectl autoscale
example.
To see a detailed description of a specific HorizontalPodAutoscaler
object
in the cluster:
kubectl describe hpa hpa-name
You can modify the HorizontalPodAutoscaler
by applying a new configuration
file with kubectl apply
, using kubectl edit
, or using kubectl patch
.
To delete a HorizontalPodAutoscaler
object:
kubectl delete hpa hpa-name
Console
To autoscale a Deployment, perform the following steps:
Visit the Google Kubernetes Engine Workloads menu in Cloud Console.
In the workloads list, click the name of the Deployment you want to autoscale.
Click list Actions > Autoscale.
Enter the Maximum number of replicas and, optionally, the Minimum number of replicas for the Deployment.
Under Autoscaling metrics, select and configure metrics as desired.
Click Autoscale.
Autoscaling with Custom Metrics
You can scale your Deployments based on custom metrics exported from Kubernetes Engine Monitoring.
To learn how to use custom metrics to autoscale deployments, refer to the Autoscaling Deployments with Custom Metrics tutorial.