This page explains how to use Horizontal Pod Autoscaler (HPA) to autoscale a Deployment using different types of metrics. You can use the same guidelines to configure an HPA for any scalable deployment object.
Before you begin
Before you start, make sure you have performed the following tasks:
- Ensure that you have enabled the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- Ensure that you have installed the Cloud SDK.
Set up default gcloud
settings using one of the following methods:
- Using
gcloud init
, if you want to be walked through setting defaults. - Using
gcloud config
, to individually set your project ID, zone, and region.
Using gcloud init
If you receive the error One of [--zone, --region] must be supplied: Please specify
location
, complete this section.
-
Run
gcloud init
and follow the directions:gcloud init
If you are using SSH on a remote server, use the
--console-only
flag to prevent the command from launching a browser:gcloud init --console-only
-
Follow the instructions to authorize
gcloud
to use your Google Cloud account. - Create a new configuration or select an existing one.
- Choose a Google Cloud project.
- Choose a default Compute Engine zone.
Using gcloud config
- Set your default project ID:
gcloud config set project PROJECT_ID
- If you are working with zonal clusters, set your default compute zone:
gcloud config set compute/zone COMPUTE_ZONE
- If you are working with regional clusters, set your default compute region:
gcloud config set compute/region COMPUTE_REGION
- Update
gcloud
to the latest version:gcloud components update
API versions for HPA objects
When you use the Google Cloud Console, HPA objects are created using the
autoscaling/v2beta2
API.
When you use kubectl
to create or view information about an HPA, you can
specify either the autoscaling/v1
API or the autoscaling/v2beta2
API.
apiVersion: autoscaling/v1
is the default, and allows you to autoscale based only on CPU utilization. To autoscale based on other metrics, using `apiVersion: autoscaling/v2beta2
is recommended. The example in Configuring a Deployment usesapiVersion: autoscaling/v1
.apiVersion: autoscaling/v2beta2
is recommended for creating new HPA objects. It allows you to autoscale based on multiple metrics, including custom or external metrics. All other examples in this topic useapiVersion: autoscaling/v2beta2
.
To check which API versions are supported, use the kubectl api-versions
command.
You can specify which API to use when
viewing details about an HPA that uses apiVersion: autoscaling/v2beta2
.
Create the example Deployment
Before you can create an HPA, you must create the workload it monitors. The
examples in this topic apply different HPA configurations to the following
nginx
Deployment. Separate examples show an HPA based on
resource utilization, based on a
custom or external metric,
and based on multiple metrics.
Save the following to a file named nginx.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
resources:
# You must specify requests for CPU to autoscale
# based on CPU utilization
requests:
cpu: "250m"
This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you do not specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.
To create the Deployment, apply the nginx.yaml
manifest:
kubectl apply -f nginx.yaml
The Deployment has spec.replicas
set to 3, so three Pods are deployed.
You can verify this using the kubectl get deployment nginx
command.
Each of the examples in this topic applies a different HPA to an example nginx Deployment.
Autoscaling based on resources utilization
This example creates an HPA object to autoscale the
nginx
Deployment when CPU utilization
surpasses 50%, and ensures that there is always a minimum of 1
replica and a maximum of 10 replicas.
You can create an HPA that targets CPU using the Cloud Console, the
kubectl apply
command, or for average CPU only, the kubectl autoscale
command.
Console
Visit the GKE Workloads menu in Cloud Console.
Click the name of the
nginx
Deployment.Expand the Actions menu and select Autoscale.
Specify the following values:
- Minimum number of Pods: 1
- Maximum number of Pods: 10
- Target CPU utilization in percent: 50
Click Autoscale.
kubectl apply
Save the following YAML manifest as a file named nginx-hpa.yaml
:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
To create the HPA, apply the manifest using the following command:
kubectl apply -f nginx-hpa.yaml
kubectl autoscale
To create an HPA object that only targets average CPU utilization, you can use
the
kubectl autoscale
command:
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10
To get a list of HPA objects in the cluster, use the following command:
kubectl get hpa
The output is similar to the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/50% 1 10 3 61s
To get details about the HPA, you can use the Cloud Console or the
kubectl
command.
Console
Visit the GKE Workloads menu in Cloud Console.
Click the name of the
nginx
Deployment.View the HPA's configuration in the Autoscaler section of the page.
View more details about autoscaling events in the Events tab.
kubectl get
To get details about the HPA, you can use kubectl get hpa
with the -o yaml
flag. The status
field contains information about the current number of
replicas and any recent autoscaling events.
kubectl get hpa nginx -o yaml
The output is similar to the following:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
HPA was able to successfully calculate a replica count from cpu resource utilization
(percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
desired count is within the acceptable range"}]'
autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
creationTimestamp: "2019-10-30T19:42:43Z"
name: nginx
namespace: default
resourceVersion: "220050"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 0
currentReplicas: 3
desiredReplicas: 3
Before following the remaining examples in this topic, delete the HPA:
kubectl delete hpa nginx
When you delete an HPA, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before an HPA was applied.
You can learn more about deleting an HPA.
Autoscaling based on a custom or external metric
You can follow along with step-by-step tutorials to create HPAs for custom metrics and external metrics.
Autoscaling based on multiple metrics
This example creates an HPA that autoscales based on CPU utilization and a
custom metric named packets_per_second
.
If you followed the previous example and still have an HPA named nginx
,
delete it before following this example.
This example requires apiVersion: autoscaling/v2beta2
. For more information
about the available APIs, see API versions for HPA objects.
Save this YAML manifest as a file named nginx-multiple.yaml
:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
# Uncomment these lines if you create the custom packets_per_second metric and
# configure your app to export the metric.
# - type: Pods
# pods:
# metric:
# name: packets_per_second
# target:
# type: AverageValue
# averageValue: 100
Apply the YAML manifest:
kubectl apply -f nginx-multiple.yaml
When created, the HPA monitors the nginx
Deployment for average CPU utilization,
average memory utilization, and (if you uncommented it) the custom
packets_per_second
metric. The HPA autoscales the Deployment based on the
metric whose value would create the larger autoscale event.
Viewing details about an HPA
To view an HPA's configuration and statistics, use the following command:
kubectl describe hpa hpa-name
Replace hpa-name with the name of your HorizontalPodAutoscaler object.
If the HPA uses apiVersion: autoscaling/v2beta2
and is based on multiple
metrics, the kubectl describe hpa
command only shows the CPU metric. To see
all metrics, use the following command instead:
kubectl describe hpa.v2beta2.autoscaling hpa-name
Replace hpa-name with the name of your HorizontalPodAutoscaler object.
Each HPA's current status is shown in Conditions
field, and autoscaling events
are listed in the Events
field.
The output is similar to the following:
Name: nginx
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp: Tue, 05 May 2020 20:07:11 +0000
Reference: Deployment/nginx
Metrics: ( current / target )
resource memory on pods: 2220032 / 100Mi
resource cpu on pods (as a percentage of request): 0% (0) / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Deleting an HPA
You can delete an HPA using the Cloud Console or the kubectl delete
command.
Console
To delete the nginx
HPA:
Visit the GKE Workloads menu in Cloud Console.
Click the name of the
nginx
Deployment.Expand the Actions menu and select Autoscale.
Select Disable Autoscaler.
kubectl delete
To delete the nginx
HPA, use the following command:
kubectl delete hpa nginx
When you delete an HPA, the Deployment or (or other deployment object) remains
at its existing scale, and does not revert back to the number of replicas in
the Deployment's original manifest. To manually scale the Deployment back to
three Pods, you can use the kubectl scale
command:
kubectl scale deployment nginx --replicas=3
Cleaning up
Delete the HPA, if you have not done so:
kubectl delete hpa nginx
Delete the
nginx
Deployment:kubectl delete deployment nginx
Optionally, delete the cluster.
What's next
- Learn more about Horizontal Pod Autoscaling.
- Learn more about Vertical Pod Autoscaling.
- Learn more about Multidimensional Pod Autoscaling.
- Learn more about autoscaling Deployments with Custom Metrics.
- Learn how to Assign CPU Resources to Containers and Pods.
- Learn how to Assign Memory Resources to Containers and Pods.