本文說明如何在 Google Distributed Cloud 中,為水平 Pod 自動調度 (HPA) 設定使用者定義的指標。
本頁內容適用於管理員、架構師和營運人員,他們負責最佳化系統架構和資源,確保公司或業務部門的總擁有成本最低,並規劃容量和基礎架構需求。如要進一步瞭解我們在 Google Cloud 內容中提及的常見角色和範例工作,請參閱「常見的 GKE Enterprise 使用者角色和工作」。
部署 Prometheus 和指標轉接程式
在本節中,您將部署 Prometheus,以抓取使用者定義的指標,並部署 prometheus-adapter,以 Prometheus 做為後端,滿足 Kubernetes Custom Metrics API 的需求。
將下列資訊清單儲存到名為 custom-metrics-adapter.yaml
的檔案。
Prometheus 和指標轉接器的資訊清單檔案內容
# Copyright 2018 Google Inc # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. apiVersion: v1 kind: ServiceAccount metadata: name: stackdriver-prometheus namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: stackdriver-prometheus namespace: kube-system rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: stackdriver-prometheus namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: stackdriver-prometheus subjects: - kind: ServiceAccount name: stackdriver-prometheus namespace: kube-system --- apiVersion: v1 kind: Service metadata: name: stackdriver-prometheus-app namespace: kube-system labels: app: stackdriver-prometheus-app spec: clusterIP: "None" ports: - name: http port: 9090 protocol: TCP targetPort: 9090 sessionAffinity: ClientIP selector: app: stackdriver-prometheus-app --- apiVersion: apps/v1 kind: Deployment metadata: name: stackdriver-prometheus-app namespace: kube-system labels: app: stackdriver-prometheus-app spec: replicas: 1 selector: matchLabels: app: stackdriver-prometheus-app template: metadata: labels: app: stackdriver-prometheus-app spec: serviceAccount: stackdriver-prometheus containers: - name: prometheus-server image: prom/prometheus:v2.45.0 args: - "--config.file=/etc/prometheus/config/prometheus.yaml" - "--storage.tsdb.path=/data" - "--storage.tsdb.retention.time=2h" ports: - name: prometheus containerPort: 9090 readinessProbe: httpGet: path: /-/ready port: 9090 periodSeconds: 5 timeoutSeconds: 3 # Allow up to 10m on startup for data recovery failureThreshold: 120 livenessProbe: httpGet: path: /-/healthy port: 9090 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 6 resources: requests: cpu: 250m memory: 500Mi volumeMounts: - name: config-volume mountPath: /etc/prometheus/config - name: stackdriver-prometheus-app-data mountPath: /data volumes: - name: config-volume configMap: name: stackdriver-prometheus-app - name: stackdriver-prometheus-app-data emptyDir: {} terminationGracePeriodSeconds: 300 nodeSelector: kubernetes.io/os: linux --- apiVersion: v1 data: prometheus.yaml: | global: scrape_interval: 1m rule_files: - /etc/config/rules.yaml - /etc/config/alerts.yaml scrape_configs: - job_name: prometheus-io-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: pod - action: keep regex: (.+) source_labels: - __meta_kubernetes_endpoint_port_name - job_name: prometheus-io-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: replace source_labels: - __address__ target_label: __param_target - action: replace replacement: blackbox target_label: __address__ - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: pod - job_name: prometheus-io-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: pod kind: ConfigMap metadata: name: stackdriver-prometheus-app namespace: kube-system --- # The main section of custom metrics adapter. kind: ServiceAccount apiVersion: v1 metadata: name: custom-metrics-apiserver namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: custom-metrics:system:auth-delegator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: custom-metrics-apiserver namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: custom-metrics-server-resources rules: - apiGroups: - custom.metrics.k8s.io resources: ["*"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: custom-metrics-resource-reader rules: - apiGroups: - "" resources: - nodes - namespaces - pods - services verbs: - get - watch - list --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: custom-metrics-resource-reader roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: custom-metrics-resource-reader subjects: - kind: ServiceAccount name: custom-metrics-apiserver namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: custom-metrics-auth-reader namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: custom-metrics-apiserver namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: kube-system data: config.yaml: | rules: default: false # fliter all metrics - seriesQuery: '{pod=~".+"}' seriesFilters: [] resources: # resource name is mapped as it is. ex. namespace -> namespace template: <<.Resource>> name: matches: ^(.*)$ as: "" # Aggregate metric on resource level metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>) --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: custom-metrics-apiserver name: custom-metrics-apiserver namespace: kube-system spec: replicas: 1 selector: matchLabels: app: custom-metrics-apiserver template: metadata: labels: app: custom-metrics-apiserver name: custom-metrics-apiserver spec: serviceAccountName: custom-metrics-apiserver containers: - name: custom-metrics-apiserver resources: requests: cpu: 15m memory: 20Mi limits: cpu: 100m memory: 150Mi image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.0 args: - /adapter - --cert-dir=/var/run/serving-cert - --secure-port=6443 - --prometheus-url=http://stackdriver-prometheus-app.kube-system.svc:9090/ - --metrics-relist-interval=1m - --config=/etc/adapter/config.yaml ports: - containerPort: 6443 volumeMounts: - name: serving-cert mountPath: /var/run/serving-cert - mountPath: /etc/adapter/ name: config readOnly: true nodeSelector: kubernetes.io/os: linux volumes: - name: serving-cert emptyDir: medium: Memory - name: config configMap: name: adapter-config --- apiVersion: v1 kind: Service metadata: name: custom-metrics-apiserver namespace: kube-system spec: ports: - port: 443 targetPort: 6443 selector: app: custom-metrics-apiserver --- apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: name: v1beta1.custom.metrics.k8s.io spec: service: name: custom-metrics-apiserver namespace: kube-system group: custom.metrics.k8s.io version: v1beta1 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100 --- apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: name: v1beta2.custom.metrics.k8s.io spec: service: name: custom-metrics-apiserver namespace: kube-system group: custom.metrics.k8s.io version: v1beta2 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100 --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: hpa-controller-custom-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: custom-metrics-server-resources subjects: - kind: ServiceAccount name: horizontal-pod-autoscaler namespace: kube-system
建立 Deployment 和 Service:
kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f custom-metrics-adapter.yaml
下一步是為使用者應用程式加上註解,以收集指標。
為使用者應用程式加上註解,以便收集指標
如要註解使用者應用程式以供擷取,並將記錄傳送至 Cloud Monitoring,您必須在服務、Pod 和端點的中繼資料中新增對應的 annotations
。
metadata: name: "example-monitoring" namespace: "default" annotations: prometheus.io/scrape: "true" prometheus.io/path: "" - Overriding metrics path (default "/metrics")
部署範例使用者應用程式
在本節中,您將部署範例應用程式,其中包含記錄和 Prometheus 相容指標。
將下列 Service 和 Deployment 資訊清單儲存至名為
my-app.yaml
的檔案。請注意,服務具有prometheus.io/scrape: "true"
註解:kind: Service apiVersion: v1 metadata: name: "example-monitoring" namespace: "default" annotations: prometheus.io/scrape: "true" spec: selector: app: "example-monitoring" ports: - name: http port: 9090 --- apiVersion: apps/v1 kind: Deployment metadata: name: "example-monitoring" namespace: "default" labels: app: "example-monitoring" spec: replicas: 1 selector: matchLabels: app: "example-monitoring" template: metadata: labels: app: "example-monitoring" spec: containers: - image: gcr.io/google-samples/prometheus-dummy-exporter:v0.2.0 name: prometheus-example-exporter command: - ./prometheus-dummy-exporter args: - --metric-name=example_monitoring_up - --metric-value=1 - --port=9090 resources: requests: cpu: 100m
建立 Deployment 和 Service:
kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f my-app.yaml
在 HPA 中使用自訂指標
部署 HPA 物件,使用上一個步驟中公開的指標。如要進一步瞭解不同類型的自訂指標,請參閱「根據多項指標和自訂指標自動調度」。
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: example-monitoring-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: example-monitoring minReplicas: 1 maxReplicas: 5 metrics: - type: Pods pods: metric: name: example_monitoring_up target: type: AverageValue averageValue: 20
Pod 類型指標具有目標 Pod 標籤的預設指標選取器,這也是 kube-controller-manager 的運作方式。在本例中,您可以使用 {matchLabels: {app: example-monitoring}}
選取器查詢 example_monitoring_up 指標,因為這些指標可在目標 Pod 中使用。系統會將指定的任何其他選取器新增至清單。如要避免使用預設選取器,可以移除目標 Pod 上的任何標籤,或使用「物件類型」指標。
確認 HPA 使用者定義的應用程式指標
確認 HPA 使用者定義的應用程式指標:
kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG describe hpa example-monitoring-hpa
輸出結果如下所示:
Name: example-monitoring-hpa Namespace: default Labels:Annotations: autoscaling.alpha.kubernetes.io/conditions: [{"type":"AbleToScale","status":"True","lastTransitionTime":"2023-08-23T22:07:24Z","reason":"ReadyForNewScale","message":"recommended size... autoscaling.alpha.kubernetes.io/current-metrics: [{"type":"Pods","pods":{"metricName":"example_monitoring_up","currentAverageValue":"1"}}] autoscaling.alpha.kubernetes.io/metrics: [{"type":"Pods","pods":{"metricName":"example_monitoring_up","targetAverageValue":"20"}}] CreationTimestamp: Wed, 23 Aug 2023 22:07:09 +0000 Reference: Deployment/example-monitoring Min replicas: 1 Max replicas: 5 Deployment pods: 1 current / 1 desired
費用
使用自訂指標進行 HPA 時,不會產生任何額外的 Cloud Monitoring 費用。啟用自訂指標的 Pod 會根據擷取的指標數量,消耗額外的 CPU 和記憶體。