此页面由 Cloud Translation API 翻译。

根据指标优化 Pod 自动扩缩

本教程演示如何根据 Cloud Monitoring 中提供的指标自动扩缩 Google Kubernetes Engine (GKE) 工作负载。

在本教程中，您可以根据以下指标之一设置自动扩缩：

Pub/Sub

Pub/Sub 积压

根据报告 Pub/Sub 订阅中剩余未确认消息数量的外部指标进行扩缩。此指标在出现问题之前可有效缩短延迟时间，但与基于 CPU 利用率的自动扩缩相比，所使用的资源可能相对较多。

自定义指标

自定义 Prometheus 指标

根据通过由 Google 管理的 Prometheus 以 Prometheus 格式导出的自定义用户定义指标进行扩缩。Prometheus 指标必须是采样平均值类型。

自动扩缩从根本上来说就是在费用和延迟之间找到可接受的平衡。您可能需要组合试用这些指标和其他指标，以找到适合您的政策。

部署自定义指标适配器

自定义指标适配器可让您的集群使用 Cloud Monitoring 发送和接收指标。

Pub/Sub

安装自定义指标适配器的步骤因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

适配器使用 custom-metrics 命名空间中的 custom-metrics-stackdriver-adapter Kubernetes 服务账号。通过分配 Monitoring Viewer 角色，允许此服务账号读取 Cloud Monitoring 指标：

gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

旧身份验证

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

自定义指标

安装自定义指标适配器的步骤因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

旧身份验证

为您的用户授予创建所需授权角色的权限：

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

在集群上部署自定义指标适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

部署具有指标的应用

下载包含本教程使用的应用代码的代码库：

Pub/Sub

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/databases/cloud-pubsub

自定义指标

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cd kubernetes-engine-samples/observability/custom-metrics-autoscaling/google-managed-prometheus

该代码库包含用于将指标导出到 Cloud Monitoring 的代码：

Pub/Sub

此应用会轮询 Pub/Sub 订阅以获取新消息，并在消息到达时对其进行确认。Pub/Sub 订阅指标由 Cloud Monitoring 自动收集。

from google import auth
from google.cloud import pubsub_v1


def main():
    """Continuously pull messages from subsciption"""

    # read default project ID
    _, project_id = auth.default()
    subscription_id = 'echo-read'

    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project_id, subscription_id)

    def callback(message: pubsub_v1.subscriber.message.Message) -> None:
        """Process received message"""
        print(f"Received message: ID={message.message_id} Data={message.data}")
        print(f"[{datetime.datetime.now()}] Processing: {message.message_id}")
        time.sleep(3)
        print(f"[{datetime.datetime.now()}] Processed: {message.message_id}")
        message.ack()

    streaming_pull_future = subscriber.subscribe(
        subscription_path, callback=callback)
    print(f"Pulling messages from {subscription_path}...")

    with subscriber:
        try:
            streaming_pull_future.result()
        except Exception as e:
            print(e)

自定义指标

此应用会使用 Prometheus 格式的常量值指标响应对 /metrics 路径的任何 Web 请求。

metric := prometheus.NewGauge(
	prometheus.GaugeOpts{
		Name: *metricName,
		Help: "Custom metric",
	},
)
prometheus.MustRegister(metric)
metric.Set(float64(*metricValue))

http.Handle("/metrics", promhttp.Handler())
log.Printf("Starting to listen on :%d", *port)
err := http.ListenAndServe(fmt.Sprintf(":%d", *port), nil)

代码库还包含一个 Kubernetes 清单，用于将应用部署到您的集群。Deployment 是一个 Kubernetes API 对象，可让您运行在集群节点中分布的多个 Pod 副本。

Pub/Sub

清单因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      serviceAccountName: pubsub-sa
      containers:
      - name: subscriber
        image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2

旧身份验证

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      volumes:
      - name: google-cloud-key
        secret:
          secretName: pubsub-key
      containers:
      - name: subscriber
        image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2
        volumeMounts:
        - name: google-cloud-key
          mountPath: /var/secrets/google
        env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /var/secrets/google/key.json

自定义指标

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: custom-metrics-gmp
  name: custom-metrics-gmp
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: custom-metrics-gmp
  template:
    metadata:
      labels:
        run: custom-metrics-gmp
    spec:
      containers:
      # sample container generating custom metrics
      - name: prometheus-dummy-exporter
        image: us-docker.pkg.dev/google-samples/containers/gke/prometheus-dummy-exporter:v0.2.0
        command: ["./prometheus-dummy-exporter"]
        args:
        - --metric-name=custom_prometheus
        - --metric-value=40
        - --port=8080

借助 PodMonitoring 资源，Google Cloud Managed Service for Prometheus 会将 Prometheus 指标导出到 Cloud Monitoring：

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: "custom-metrics-exporter"
spec:
  selector:
    matchLabels:
      run: custom-metrics-gmp
  endpoints:
  - port: 8080
    path: /metrics
    interval: 15s

从 GKE Standard 1.27 版或 GKE Autopilot 1.25 版开始，Google Cloud Managed Service for Prometheus 处于启用状态。如需在版本较低的集群中启用 Google Cloud Managed Service for Prometheus，请参阅启用托管式收集功能。

将应用部署到您的集群：

Pub/Sub

部署应用的流程因集群是否启用了适用于 GKE 的工作负载身份联合而异。选择与您在创建集群时选择的设置相匹配的选项。

Workload Identity

在您的项目上启用 Pub/Sub API：

gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com

创建 Pub/Sub 主题和订阅：

gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo

将应用部署到您的集群：

kubectl apply -f deployment/pubsub-with-workload-identity.yaml

此应用定义了一个 pubsub-sa Kubernetes 服务账号。为其分配 Pub/Sub Subscriber 角色，以便应用可以将消息发布到 Pub/Sub 主题。
```
gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
  --role=roles/pubsub.subscriber \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/pubsub-sa
```
上述命令使用了主账号标识符，可让 IAM 直接引用 Kubernetes 服务账号。

最佳实践：
使用主账号标识符，但请考虑替代方法说明中的限制。

旧身份验证

在您的项目上启用 Pub/Sub API：

gcloud services enable cloudresourcemanager.googleapis.com pubsub.googleapis.com

创建 Pub/Sub 主题和订阅：

gcloud pubsub topics create echo
gcloud pubsub subscriptions create echo-read --topic=echo

创建拥有 Pub/Sub 访问权限的服务账号：

gcloud iam service-accounts create autoscaling-pubsub-sa
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member "serviceAccount:autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/pubsub.subscriber"

下载服务账号密钥文件：

gcloud iam service-accounts keys create key.json \
  --iam-account autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com

将服务账号密钥作为 Secret 导入到您的集群：

kubectl create secret generic pubsub-key --from-file=key.json=./key.json

将应用部署到您的集群：

kubectl apply -f deployment/pubsub-with-secret.yaml

自定义指标

kubectl apply -f custom-metrics-gmp.yaml

等待应用部署后，所有 Pod 都达到了 Ready 状态：

Pub/Sub

kubectl get pods

输出：

NAME                     READY   STATUS    RESTARTS   AGE
pubsub-8cd995d7c-bdhqz   1/1     Running   0          58s

自定义指标

kubectl get pods

输出：

NAME                                  READY   STATUS    RESTARTS   AGE
custom-metrics-gmp-865dffdff9-x2cg9   1/1     Running   0          49s

在 Cloud Monitoring 上查看指标

应用在运行时，会将您的指标写入 Cloud Monitoring。

如需使用 Metrics Explorer 查看受监控资源的指标，请执行以下操作：

在 Google Cloud 控制台中，前往 Metrics Explorer 页面：
进入 Metrics Explorer

如果您使用搜索栏查找此页面，请选择子标题为监控的结果。
在指标元素中，展开选择指标菜单，然后选择资源类型和指标类型。例如，如需绘制虚拟机的 CPU 利用率图表，请执行以下操作：
1. （可选）如需减少显示的菜单选项，请在过滤条件栏中输入部分指标名称。在此示例中，请输入 utilization。
2. 在活跃资源菜单中，选择虚拟机实例。
3. 在活跃指标类别菜单中，选择实例。
4. 在活跃指标菜单中，选择 CPU 利用率，然后点击应用。
如需过滤显示的时序，请使用过滤条件元素。
如需组合时序，请使用聚合元素上的菜单。例如，如需根据虚拟机所在的可用区显示虚拟机的 CPU 利用率，请将第一个菜单设置为平均值并将第二个菜单设置为可用区。

当聚合元素的第一个菜单设置为未聚合时，系统会显示所有时序。聚合元素的默认设置由您选择的指标类型决定。

资源类型和指标如下所示：

Pub/Sub

Metrics Explorer

资源类型：pubsub_subscription

指标：pubsub.googleapis.com/subscription/num_undelivered_messages

自定义指标

Metrics Explorer

资源类型：prometheus_target

指标：prometheus.googleapis.com/custom_prometheus/gauge

您可能尚未在 Cloud Monitoring Metrics Explorer 上看到太多活动，这取决于具体指标。如果您的指标没有更新，无需感到意外。

创建 HorizontalPodAutoscaler 对象

在 Cloud Monitoring 中看到指标后，您可以部署 HorizontalPodAutoscaler 以根据指标调整 Deployment 的大小。

Pub/Sub

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: pubsub
spec:
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - external:
      metric:
       name: pubsub.googleapis.com|subscription|num_undelivered_messages
       selector:
         matchLabels:
           resource.labels.subscription_id: echo-read
      target:
        type: AverageValue
        averageValue: 2
    type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pubsub

自定义指标

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-gmp-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-gmp
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metric:
        name: prometheus.googleapis.com|custom_prometheus|gauge
      target:
        type: AverageValue
        averageValue: 20

将 HorizontalPodAutoscaler 部署到您的集群：

Pub/Sub

kubectl apply -f deployment/pubsub-hpa.yaml

自定义指标

kubectl apply -f custom-metrics-gmp-hpa.yaml

生成负载

对于某些指标，您可能需要生成负载以监控自动扩缩：

Pub/Sub

将 200 条消息发布到 Pub/Sub 主题：

for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}"; done

自定义指标

不适用：此示例中使用的代码会导出自定义指标的常量值 40。HorizontalPodAutoscaler 的目标值设置为 20，因此它会尝试自动纵向扩容 Deployment。

您可能需要等待几分钟，以使 HorizontalPodAutoscaler 对指标更改做出响应。

观察 HorizontalPodAutoscaler 纵向扩容

您可以通过运行以下命令检查 Deployment 的当前副本数：

kubectl get deployments

指标传播一段时间后，Deployment 会创建 5 个 Pod 来处理积压输入。

您还可以通过运行以下命令来检查 HorizontalPodAutoscaler 的状态和近期活动：

kubectl describe hpa