使用 GKE 工作负载指标自动扩缩 Deployment

本教程演示如何根据应用发出的自定义指标(例如活跃登录的帐号或自动扩缩的帐号数量)自动扩缩 Google Kubernetes Engine (GKE) 工作负载。

您可以使用 GKE 工作负载指标流水线收集应用发出的指标,将这些指标发送到 Cloud Monitoring,然后使用它们支持水平 Pod 自动扩缩器 (HPA)

目标

本教程介绍了以下任务:

  1. 如何部署一个发出 Prometheus 样式指标的示例应用。
  2. 如何部署 PodMonitor 资源以从您的应用爬取指标并将其发布到 Cloud Monitoring。
  3. 如何部署自定义指标适配器
  4. 如何使用 Kubernetes Custom Metrics API 查询工作负载指标。
  5. 如何部署水平 Pod 自动扩缩器 (HPA) 资源,以根据从应用爬取的工作负载指标扩缩应用。

准备工作

请按照以下步骤启用 Kubernetes Engine API:
  1. 访问 Google Cloud Console 中的 Kubernetes Engine 页面
  2. 创建或选择项目。
  3. 稍作等待,让 API 和相关服务完成启用过程。 此过程可能耗时几分钟。
  4. 确保您的 Cloud 项目已启用结算功能。 了解如何确认您的项目是否已启用结算功能

您可以使用 Cloud Shell 来执行本教程中所述的操作,该环境中预装了本教程中用到的 gcloudkubectl 命令行工具。如果使用 Cloud Shell,则无需在工作站上安装这些命令行工具。

如需使用 Cloud Shell,请执行以下操作:

  1. 转到 Google Cloud Console
  2. 点击 Cloud Console 窗口顶部的激活 Cloud Shell 激活 Shell 按钮 按钮。

    一个 Cloud Shell 会话随即会在 Cloud Console 底部的新框内打开,并显示命令行提示符。

    Cloud Shell 会话

设置您的环境

  1. 如需创建启用了工作负载指标的新集群,请使用以下命令:

    gcloud beta container clusters create CLUSTER_NAME \
        --project=PROJECT_ID \
        --zone=ZONE \
        --monitoring=SYSTEM,WORKLOAD
    

    替换以下内容:

    • CLUSTER_NAME:您的集群的名称。
    • PROJECT_ID:您的 Google Cloud 项目的 ID。
    • ZONE:选择离您最近的可用区

    此操作需要项目的 container.clusters.create 权限。

  2. 要在现有标准或 Autopilot 集群上启用工作负载指标,请使用以下命令修改集群:

    gcloud beta container clusters update CLUSTER_NAME \
        --project=PROJECT_ID \
        --zone=ZONE \
        --monitoring=SYSTEM,WORKLOAD
    

部署一个发出 Prometheus 样式指标的示例应用

下载包含本教程使用的应用代码的代码库:

  git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
  cd kubernetes-engine-samples/workload-metrics

本教程的示例应用会生成两个指标,并通过 localhost:1234/metrics 中的内置 Prometheus 端点公开这些指标:

  • example_requests_total:应用轮询本身生成的请求计数器
  • example_random_numbers:随机生成的数字的直方图

该代码库包含一个 Kubernetes 清单,用于将应用部署到您的集群:

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: workload-metrics
  name: workload-metrics
  namespace: gke-workload-metrics
spec:
  selector:
    matchLabels:
      app: workload-metrics
  template:
    metadata:
      labels:
        app: workload-metrics
    spec:
      containers:
      - image: us-docker.pkg.dev/google-samples/containers/workload-metrics:1.0
        imagePullPolicy: Always
        name: workload-metrics
        ports:
        - name: metrics-port
          containerPort: 1234
        command:
        - "/workload-metrics"
        - "--process-metrics"
        - "--go-metrics"
在您的集群上部署应用:

  kubectl create namespace gke-workload-metrics
  kubectl apply -f manifests/workload-metrics-deployment.yaml

等待应用部署后,所有 Pod 都达到了 Ready 状态:

  kubectl -n gke-workload-metrics get pods

输出:

  NAME                                READY   STATUS    RESTARTS   AGE
  workload-metrics-74fb6c56df-9djq7   1/1     Running   0          1m

部署 PodMonitor 资源以从示例应用爬取指标

如需收集从示例应用发出的指标,您需要创建 PodMonitor 自定义资源。

该代码库包含一个 Kubernetes 清单,用于将 PodMonitor 部署到您的集群:

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Note that this PodMonitor is in the monitoring.gke.io domain,
# rather than the monitoring.coreos.com domain used with the
# Prometheus Operator
apiVersion: monitoring.gke.io/v1alpha1
kind: PodMonitor
metadata:
  name: workload-metrics-podmon
# spec describes how to monitor a set of pods in a cluster.
spec:
  # namespaceSelector determines which namespace is searched for pods. Required
  namespaceSelector:
    matchNames:
    - gke-workload-metrics
  # selector determines which pods are monitored.  Required
  # This example matches pods with the `app: workload-metrics-example` label
  selector:
    matchLabels:
      app: workload-metrics
  podMetricsEndpoints:
    # port is the name of the port of the container to be scraped.
  - port: metrics-port
    # path is the path of the endpoint to be scraped.
    # Default /metrics
    path: /metrics
    # scheme is the scheme of the endpoint to be scraped.
    # Default http
    scheme: http
    # interval is the time interval at which metrics should
    # be scraped. Default 60s
    interval: 20s
在您的集群上部署 PodMonitor

  kubectl apply -f manifests/workload-metrics-podmon.yaml

部署自定义指标适配器

自定义指标适配器可让您的集群使用 Monitoring 发送和接收指标。

  1. 为您的用户授予创建所需授权角色的权限:

    kubectl create clusterrolebinding cluster-admin-binding \
        --clusterrole cluster-admin --user "$(gcloud config get-value account)"
    
  2. 如果您使用的是 Autopilot 集群或启用 Workload Identity 的集群,请执行以下操作:

    1. 为适配器创建命名空间:

      kubectl create namespace custom-metrics
      
    2. 为适配器创建 Kubernetes 服务帐号:

      kubectl create serviceaccount --namespace custom-metrics \
      custom-metrics-stackdriver-adapter
      
    3. 创建有权查看 Monitoring 指标的 Google 服务帐号:

      gcloud iam service-accounts create GSA_NAME
      
      gcloud projects add-iam-policy-binding PROJECT_ID \
          --member "serviceAccount:GSA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
          --role "roles/monitoring.viewer"
      

      替换以下内容:

      • PROJECT_ID:您的 Google Cloud 项目 ID。
      • GSA_NAME:您的 Google 服务帐号的名称。
    4. 通过创建 IAM 政策绑定,允许 Kubernetes 服务帐号模拟 Google 服务帐号:

      gcloud iam service-accounts add-iam-policy-binding \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
        GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
      
    5. 为 Kubernetes 服务帐号添加注释以指明绑定:

      kubectl annotate serviceaccount \
        --namespace custom-metrics custom-metrics-stackdriver-adapter \
        iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
      
  3. 在您的集群上部署新资源模型适配器:

    kubectl apply -f manifests/adapter_new_resource_model.yaml
    
  4. 检查自定义指标适配器是否已部署并处于 Ready 状态:

    kubectl -n custom-metrics get pods
    

    输出:

    NAME                                                 READY   STATUS    RESTARTS   AGE
    custom-metrics-stackdriver-adapter-6d4fc94699-zqndq  1/1     Running   0          2m
    

使用 Kubernetes Custom Metrics API 查询工作负载指标

您可以使用 Kubernetes Custom Metrics API 来查看您的工作负载指标是否对 GKE 可见。

GKE 工作负载指标将以前缀 workload.googleapis.com 导出到 Monitoring。 Kubernetes Custom Metrics API 服务器不支持指标路径中的 / 字符,因此您需要将所有 / 字符替换为 |。 因此,您必须使用 workload.googleapis.com|example_request_total 作为指标名称。

等待应用指标发送到 Monitoring 后,运行以下命令以查询 workload.googleapis.com|example_request_total 指标:

   kubectl get --raw  \
   "/apis/custom.metrics.k8s.io/v1beta2/namespaces/gke-workload-metrics/pods/*/workload.googleapis.com|example_requests_total"

输出:

  {"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta2",
  "metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta2/namespaces/
  gke-workload-metrics/pods/%2A/workload.googleapis.com%7Cexample_requests_total"},
  "items":[{"describedObject":{"kind":"Pod","namespace":"gke-workload-metrics",
  "name":"prom-example-74fb6c56df-9djq7","apiVersion":"/__internal"},"metric":
  {"name":"workload.googleapis.com|example_requests_total","selector":null},"timestamp ":
  "2021-08-23T10:48:45Z","value":"1199m"}]}

部署 HorizontalPodAutoscaler 对象

在上一步的 Custom Metrics API 的响应载荷中看到 workload.googleapis.com|example_requests_total 指标后,您可以部署水平 Pod 自动扩缩器 (HPA),以根据该指标调整部署的大小。

该代码库包含一个 Kubernetes 清单,用于将 HPA 部署到您的集群:

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: workload-metrics-hpa
  namespace: gke-workload-metrics
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: workload-metrics
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metric:
        name: workload.googleapis.com|example_requests_total
      target:
        type: AverageValue
        averageValue: 1
此 HPA 将最小 pod 副本设置为 1,将最大值设置为 5。它会扩缩您的部署,以确保所有 pod 的 workload.googleapis.com|example_request_total 的平均值为 1。

在您的集群上部署 HorizontalPodAutoscaler

  kubectl apply -f manifests/workload-metrics-hpa.yaml

观察 HorizontalPodAutoscaler 纵向扩容

您可以定期检查部署中的副本数量,并运行以下命令来查看其数量是否增加到 5 个副本:

  kubectl -n gke-workload-metrics get pods

输出:

  NAME                                READY   STATUS    RESTARTS   AGE
  workload-metrics-74fb6c56df-9djq7   1/1     Running   0          5m
  workload-metrics-74fb6c56df-frzbv   1/1     Running   0          7m
  workload-metrics-74fb6c56df-h26rw   1/1     Running   0          8m
  workload-metrics-74fb6c56df-kwvx9   1/1     Running   0          10m
  workload-metrics-74fb6c56df-vvtnn   1/1     Running   0          11m

您还可以通过运行以下命令来检查水平 Pod 自动扩缩器的状态和活动:

  kubectl -n gke-workload-metrics describe hpa workload-metrics-hpa

清理

为避免因本教程中使用的资源导致您的 Google Cloud 帐号产生费用,请删除包含这些资源的项目,或者保留项目但删除各个资源。

后续步骤