根据指标优化 Pod 自动扩缩

本教程演示如何根据 Cloud Monitoring 中提供的指标自动扩缩 Google Kubernetes Engine (GKE) 工作负载。

在本教程中，您可以根据以下指标之一设置自动扩缩：

Pub/Sub

Pub/Sub 积压

根据报告 Pub/Sub 订阅中剩余未确认消息数量的外部指标进行扩缩。此指标在出现问题之前可有效缩短延迟时间，但与基于 CPU 利用率的自动扩缩相比，所使用的资源可能相对较多。

自定义指标

自定义 Prometheus 指标

根据通过由 Google 管理的 Prometheus 以 Prometheus 格式导出的自定义用户定义指标进行扩缩。Prometheus 指标必须是采样平均值类型。

自动扩缩从根本上来说就是在费用和延迟之间找到可接受的平衡。您可能需要组合试用这些指标和其他指标，以找到适合您的政策。

目标

本教程介绍了以下任务：

如何部署自定义指标适配器。
如何从应用代码中导出指标。
如何在 Cloud Monitoring 界面上查看指标。
如何部署 HorizontalPodAutoscaler (HPA) 资源，以根据 Cloud Monitoring 指标扩缩您的应用。

费用

在本文档中，您将使用 Google Cloud的以下收费组件：

GKE
Pub/Sub

您可使用价格计算器根据您的预计使用情况来估算费用。

新 Google Cloud 用户可能有资格申请免费试用。

完成本文档中描述的任务后，您可以通过删除所创建的资源来避免继续计费。如需了解详情，请参阅清理。

准备工作

请按照以下步骤启用 Kubernetes Engine API：

访问 Google Cloud 控制台中的 Kubernetes Engine 页面。
创建或选择项目。
稍作等待，让 API 和相关服务完成启用过程。此过程可能耗时几分钟。
Verify that billing is enabled for your Google Cloud project.

您可以使用 Cloud Shell 来执行本教程中所述的操作，该环境中预装了本教程中用到的 gcloud 和 kubectl 命令行工具。如果使用 Cloud Shell，则无需在工作站上安装这些命令行工具。

如需使用 Cloud Shell，请执行以下操作：

前往 Google Cloud 控制台。
点击 Google Cloud 控制台窗口顶部的激活 Cloud Shell 按钮。

一个 Cloud Shell 会话随即会在 Google Cloud 控制台底部的新框架内打开，并显示命令行提示符。

设置您的环境

设置 Google Cloud CLI 的默认可用区：
```
gcloud config set compute/zone zone
```
替换以下内容：
- zone：选择离您最近的区域。如需了解详情，请参阅区域和可用区。

将 PROJECT_ID 和 PROJECT_NUMBER 环境变量设置为您的 Google Cloud 项目 ID 和项目编号：

export PROJECT_ID=project-id
export PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format 'get(projectNumber)')

设置 Google Cloud CLI 的默认可用区：
```
gcloud config set project $PROJECT_ID
```
创建 GKE 集群

最佳实践：
为了在访问 Google Cloud 服务时增强安全性，请在集群上启用 Workload Identity Federation for GKE。虽然本页面包含使用旧方法（停用 Workload Identity Federation for GKE）的示例，但启用该功能可增强保护。
Workload Identity
如需创建启用了适用于 GKE 的工作负载身份联合的集群，请运行以下命令：
```
gcloud container clusters create metrics-autoscaling --workload-pool=$PROJECT_ID.svc.id.goog
```
旧身份验证
如需创建停用了适用于 GKE 的工作负载身份联合的集群，请运行以下命令：
```
gcloud container clusters create metrics-autoscaling
```

部署自定义指标适配器

自定义指标适配器可让您的集群使用 Cloud Monitoring 发送和接收指标。

Pub/Sub

安装自定义指标适配器的步骤因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

适配器使用 custom-metrics 命名空间中的 custom-metrics-stackdriver-adapter Kubernetes 服务账号。通过分配 Monitoring Viewer 角色，允许此服务账号读取 Cloud Monitoring 指标：

gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

旧身份验证

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

自定义指标

安装自定义指标适配器的步骤因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

旧身份验证

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

部署具有指标的应用

下载包含本教程使用的应用代码的代码库：

Pub/Sub

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/databases/cloud-pubsub

自定义指标

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/observability/custom-metrics-autoscaling/google-managed-prometheus

该代码库包含用于将指标导出到 Cloud Monitoring 的代码：

Pub/Sub

此应用会轮询 Pub/Sub 订阅以获取新消息，并在消息到达时对其进行确认。Pub/Sub 订阅指标由 Cloud Monitoring 自动收集。

from google import auth
from google.cloud import pubsub_v1


def main():
    """Continuously pull messages from subsciption"""

    # read default project ID
    _, project_id = auth.default()
    subscription_id = 'echo-read'

    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project_id, subscription_id)

    def callback(message: pubsub_v1.subscriber.message.Message) -> None:
        """Process received message"""
        print(f"Received message: ID={message.message_id} Data={message.data}")
        print(f"[{datetime.datetime.now()}] Processing: {message.message_id}")
        time.sleep(3)
        print(f"[{datetime.datetime.now()}] Processed: {message.message_id}")
        message.ack()

    streaming_pull_future = subscriber.subscribe(
        subscription_path, callback=callback)
    print(f"Pulling messages from {subscription_path}...")

    with subscriber:
        try:
            streaming_pull_future.result()
        except Exception as e:
            print(e)

自定义指标

此应用会使用 Prometheus 格式的常量值指标响应对 /metrics 路径的任何 Web 请求。

metric := prometheus.NewGauge(
	prometheus.GaugeOpts{
		Name: *metricName,
		Help: "Custom metric",
	},
)
prometheus.MustRegister(metric)
metric.Set(float64(*metricValue))

http.Handle("/metrics", promhttp.Handler())
log.Printf("Starting to listen on :%d", *port)
err := http.ListenAndServe(fmt.Sprintf(":%d", *port), nil)

代码库还包含一个 Kubernetes 清单，用于将应用部署到您的集群。Deployment 是一个 Kubernetes API 对象，可让您运行在集群的节点中分布的多个 Pod 副本：

Pub/Sub

清单因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      serviceAccountName: pubsub-sa
      containers:
      - name: subscriber
        image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2

旧身份验证

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      volumes:
      - name: google-cloud-key
        secret:
          secretName: pubsub-key
      containers:
      - name: subscriber
        image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2
        volumeMounts:
        - name: google-cloud-key
          mountPath: /var/secrets/google
        env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /var/secrets/google/key.json

自定义指标

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: custom-metrics-gmp
  name: custom-metrics-gmp
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: custom-metrics-gmp
  template:
    metadata:
      labels:
        run: custom-metrics-gmp
    spec:
      containers:
      # sample container generating custom metrics
      - name: prometheus-dummy-exporter
        image: us-docker.pkg.dev/google-samples/containers/gke/prometheus-dummy-exporter:v0.2.0
        command: ["./prometheus-dummy-exporter"]
        args:
        - --metric-name=custom_prometheus
        - --metric-value=40
        - --port=8080

借助 PodMonitoring 资源，Google Cloud Managed Service for Prometheus 会将 Prometheus 指标导出到 Cloud Monitoring：

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: "custom-metrics-exporter"
spec:
  selector:
    matchLabels:
      run: custom-metrics-gmp
  endpoints:
  - port: 8080
    path: /metrics
    interval: 15s

从 GKE Standard 1.27 版或 GKE Autopilot 1.25 版开始，Google Cloud Managed Service for Prometheus 处于启用状态。如需在版本较低的集群中启用 Google Cloud Managed Service for Prometheus，请参阅启用托管式收集功能。

将应用部署到您的集群：

Pub/Sub

部署应用的流程因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

在您的项目上启用 Pub/Sub API：

gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com

创建 Pub/Sub 主题和订阅：

gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo

将应用部署到您的集群：

kubectl apply -f deployment/pubsub-with-workload-identity.yaml

此应用定义了一个 pubsub-sa Kubernetes 服务账号。为其分配 Pub/Sub Subscriber 角色，以便应用可以将消息发布到 Pub/Sub 主题。
```
gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role=roles/pubsub.subscriber \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/pubsub-sa
```
上述命令使用了主账号标识符，可让 IAM 直接引用 Kubernetes 服务账号。

最佳实践：
使用主账号标识符，但请考虑替代方法说明中的限制。

旧身份验证

在您的项目上启用 Pub/Sub API：

gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com

创建 Pub/Sub 主题和订阅：

gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo

创建拥有 Pub/Sub 访问权限的服务账号：

gcloud iam service-accounts create autoscaling-pubsub-sa
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member "serviceAccount:autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/pubsub.subscriber"

下载服务账号密钥文件：

gcloud iam service-accounts keys create key.json \
  --iam-account autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com

将服务账号密钥作为 Secret 导入到您的集群：

kubectl create secret generic pubsub-key --from-file=key.json=./key.json

将应用部署到您的集群：

kubectl apply -f deployment/pubsub-with-secret.yaml

自定义指标

kubectl apply -f custom-metrics-gmp.yaml

等待应用部署后，所有 Pod 都达到了 Ready 状态：

Pub/Sub

kubectl get pods

输出：

NAME                     READY   STATUS    RESTARTS   AGE
pubsub-8cd995d7c-bdhqz   1/1     Running   0          58s

自定义指标

kubectl get pods

输出：

NAME                                  READY   STATUS    RESTARTS   AGE
custom-metrics-gmp-865dffdff9-x2cg9   1/1     Running   0          49s

在 Cloud Monitoring 上查看指标

应用在运行时，会将您的指标写入 Cloud Monitoring。

如需使用 Metrics Explorer 查看受监控资源的指标，请执行以下操作：

在 Google Cloud 控制台中，前往 Metrics Explorer 页面：
进入 Metrics Explorer

如果您使用搜索栏查找此页面，请选择子标题为监控的结果。
在指标元素中，展开选择指标菜单，然后选择资源类型和指标类型。例如，如需绘制虚拟机的 CPU 利用率图表，请执行以下操作：
1. （可选）如需减少显示的菜单选项，请在过滤条件栏中输入部分指标名称。在此示例中，请输入 utilization。
2. 在活跃资源菜单中，选择虚拟机实例。
3. 在活跃指标类别菜单中，选择实例。
4. 在活跃指标菜单中，选择 CPU 利用率，然后点击应用。
如需过滤显示的时序，请使用过滤条件元素。
如需组合时序，请使用聚合元素上的菜单。例如，如需根据虚拟机所在的可用区显示虚拟机的 CPU 利用率，请将第一个菜单设置为平均值并将第二个菜单设置为可用区。

当聚合元素的第一个菜单设置为未聚合时，系统会显示所有时序。聚合元素的默认设置由您选择的指标类型决定。

资源类型和指标如下所示：

Pub/Sub

Metrics Explorer

资源类型：pubsub_subscription

指标：pubsub.googleapis.com/subscription/num_undelivered_messages

自定义指标

Metrics Explorer

资源类型：prometheus_target

指标：prometheus.googleapis.com/custom_prometheus/gauge

您可能尚未在 Cloud Monitoring Metrics Explorer 上看到太多活动，这取决于具体指标。如果您的指标没有更新，无需感到意外。

创建 HorizontalPodAutoscaler 对象

在 Cloud Monitoring 中看到指标后，您可以部署 HorizontalPodAutoscaler 以根据指标调整 Deployment 的大小。

Pub/Sub

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: pubsub
spec:
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - external:
      metric:
       name: pubsub.googleapis.com|subscription|num_undelivered_messages
       selector:
         matchLabels:
           resource.labels.subscription_id: echo-read
      target:
        type: AverageValue
        averageValue: 2
    type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pubsub

自定义指标

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-gmp-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-gmp
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metric:
        name: prometheus.googleapis.com|custom_prometheus|gauge
      target:
        type: AverageValue
        averageValue: 20

将 HorizontalPodAutoscaler 部署到您的集群：

Pub/Sub

kubectl apply -f deployment/pubsub-hpa.yaml

自定义指标

kubectl apply -f custom-metrics-gmp-hpa.yaml

生成负载

对于某些指标，您可能需要生成负载以监控自动扩缩：

Pub/Sub

将 200 条消息发布到 Pub/Sub 主题：

for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}"; done

自定义指标

不适用：此示例中使用的代码会导出自定义指标的常量值 40。HorizontalPodAutoscaler 的目标值设置为 20，因此它会尝试自动纵向扩容 Deployment。

您可能需要等待几分钟，以使 HorizontalPodAutoscaler 对指标更改做出响应。

观察 HorizontalPodAutoscaler 纵向扩容

您可以通过运行以下命令检查 Deployment 的当前副本数：

kubectl get deployments

指标传播一段时间后，Deployment 会创建 5 个 Pod 来处理积压输入。

您还可以通过运行以下命令来检查 HorizontalPodAutoscaler 的状态和近期活动：

kubectl describe hpa

清理

为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的项目，或者保留项目但删除各个资源。

Pub/Sub

清理 Pub/Sub 订阅和主题：

gcloud pubsub subscriptions delete echo-read
gcloud pubsub topics delete echo

删除 GKE 集群：

gcloud container clusters delete metrics-autoscaling

自定义指标

删除 GKE 集群：

 gcloud container clusters delete metrics-autoscaling

后续步骤

详细了解用于扩缩工作负载的自定义指标和外部指标
浏览其他 Kubernetes Engine 教程。

根据指标优化 Pod 自动扩缩 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

Pub/Sub

自定义指标

目标

费用

准备工作

设置您的环境

Workload Identity

旧身份验证

部署自定义指标适配器

Pub/Sub

Workload Identity

旧身份验证

自定义指标

Workload Identity

旧身份验证

部署具有指标的应用

Pub/Sub

自定义指标

Pub/Sub

自定义指标

Pub/Sub

Workload Identity

旧身份验证

自定义指标

Pub/Sub

Workload Identity

旧身份验证

自定义指标

Pub/Sub

自定义指标

在 Cloud Monitoring 上查看指标

Pub/Sub

自定义指标

创建 HorizontalPodAutoscaler 对象

Pub/Sub

自定义指标

Pub/Sub

自定义指标

生成负载

Pub/Sub

自定义指标

观察 HorizontalPodAutoscaler 纵向扩容

清理

Pub/Sub

自定义指标

后续步骤

根据指标优化 Pod 自动扩缩