设置 GKE Dataplane V2 可观测性


本页面介绍了从 GKE 1.28 版或更高版本开始,如何配置具有 GKE Dataplane V2 可观测性的 Google Kubernetes Engine (GKE) 集群。如需详细了解 GKE Dataplane V2 可观测性的优势和要求,请参阅 GKE Dataplane V2 可观测性简介

须知事项

在开始之前,请确保您已执行以下任务:

  • 启用 Google Kubernetes Engine API。
  • 启用 Google Kubernetes Engine API
  • 如果您要使用 Google Cloud CLI 执行此任务,请安装初始化 gcloud CLI。 如果您之前安装了 gcloud CLI,请运行 gcloud components update 以获取最新版本。

配置 GKE Dataplane V2 指标

如需收集指标,您必须配置 GKE Dataplane V2 指标。在创建集群或更新使用 GKE Dataplane V2 运行的集群时,您可以配置 GKE Dataplane V2 指标。您可以使用 gcloud CLI 启用或停用 GKE Dataplane V2 指标。

我们建议您在 GKE 集群上启用 GKE Dataplane V2 指标和 Google Cloud Managed Service for Prometheus。启用这两者后,系统会将 GKE Dataplane V2 指标发送到 Google Cloud Managed Service for Prometheus。

创建启用 GKE Dataplane V2 指标的 Autopilot 集群

创建新的 GKE Autopilot 集群时,GKE 默认会在该集群上启用 GKE Dataplane V2 指标,而不需要您指定特定标志。

如需将 GKE Autopilot 集群 GKE Dataplane V2 指标与 Google Cloud Managed Service for Prometheus 搭配使用,请配置 ClusterPodMonitoring 资源以爬取这些指标,并将其发送到 Google Cloud Managed Service for Prometheus。

  1. 创建 ClusterPodMonitoring 清单:

    # Copyright 2023 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    apiVersion: monitoring.googleapis.com/v1
    kind: ClusterPodMonitoring
    metadata:
      name: advanced-datapath-observability-metrics
    spec:
      selector:
        matchLabels:
          k8s-app: cilium
      endpoints:
      - port: flowmetrics
        interval: 60s
        metricRelabeling:
        # only keep denormalized pod flow metrics
        - sourceLabels: [__name__]
          regex: 'pod_flow_(ingress|egress)_flows_count'
          action: keep
        # extract pod name
        - sourceLabels: [__name__, destination]
          regex: 'pod_flow_ingress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${2}'
          targetLabel: pod_name
          action: replace
        - sourceLabels: [__name__, source]
          regex: 'pod_flow_egress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${2}'
          targetLabel: pod_name
          action: replace
        # extract workload name by removing 2 last "-XXX" parts
        - sourceLabels: [pod_name]
          regex: '([a-zA-Z0-9-\.]+)((-[a-zA-Z0-9\.]+){2})'
          replacement: '${1}'
          targetLabel: workload_name
          action: replace
        # extract workload name by removing one "-XXX" part when pod name has only 2 parts (eg. daemonset)
        - sourceLabels: [pod_name]
          regex: '([a-zA-Z0-9\.]+)((-[a-zA-Z0-9\.]+){1})'
          replacement: '${1}'
          targetLabel: workload_name
          action: replace
        # extract pod namespace
        - sourceLabels: [__name__, destination]
          regex: 'pod_flow_ingress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${1}'
          targetLabel: namespace_name
          action: replace
        - sourceLabels: [__name__, source]
          regex: 'pod_flow_egress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${1}'
          targetLabel: namespace_name
          action: replace
        # extract remote workload name
        - sourceLabels: [__name__, source]
          regex: 'pod_flow_ingress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${2}'
          targetLabel: remote_workload
          action: replace
        - sourceLabels: [__name__, destination]
          regex: 'pod_flow_egress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${2}'
          targetLabel: remote_workload
          action: replace
        # extract remote workload namespace
        - sourceLabels: [__name__, source]
          regex: 'pod_flow_ingress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${1}'
          targetLabel: remote_namespace
          action: replace
        - sourceLabels: [__name__, destination]
          regex: 'pod_flow_egress_flows_count;([a-zA-Z0-9-\.]+)/([a-zA-Z0-9-\.]+)'
          replacement: '${1}'
          targetLabel: remote_namespace
          action: replace
        # default remote workload class to "pod"
        - replacement: 'pod'
          targetLabel: remote_class
          action: replace
        # extract remote workload class from reserved identity
        - sourceLabels: [__name__, source]
          regex: 'pod_flow_ingress_flows_count;reserved:([^/]*)'
          replacement: '${1}'
          targetLabel: remote_class
          action: replace
        - sourceLabels: [__name__, destination]
          regex: 'pod_flow_egress_flows_count;reserved:([^/]*)'
          replacement: '${1}'
          targetLabel: remote_class
          action: replace
      targetLabels:
        metadata: []
    
  2. 应用 ClusterPodMonitoring 清单:

    kubectl apply -f ClusterPodMonitoring.yaml
    

创建启用了 GKE Dataplane V2 指标的 Standard 集群

如需启用 GKE Dataplane V2 指标,请在创建集群时使用 --enable-dataplane-v2-metrics 标志:

gcloud container clusters create CLUSTER_NAME \
    --enable-dataplane-v2 \
    --enable-ip-alias \
    --enable-managed-prometheus \
    --enable-dataplane-v2-metrics

请替换以下内容:

  • CLUSTER_NAME:您的集群的名称。

--enable-managed-prometheus 标志指示 GKE 将该指标与 Google Cloud Managed Service for Prometheus 配合使用。

在现有集群上启用 GKE Dataplane V2 指标

如需在现有集群上启用 GKE Dataplane V2 指标,请运行以下命令:

gcloud container clusters update CLUSTER_NAME \
    --enable-dataplane-v2-metrics

CLUSTER_NAME 替换为您的集群名称。

停用 GKE Dataplane V2 指标

如需停用 GKE Dataplane V2 指标,请执行以下操作:

gcloud container clusters update CLUSTER_NAME \
    --disable-dataplane-v2-metrics

CLUSTER_NAME 替换为您的集群名称。

配置 GKE Dataplane V2 可观测性工具

您可以使用专用端点访问 GKE Dataplane V2 可观测性问题排查工具。如需启用 GKE Dataplane V2 可观测性工具,您必须具有一个配置了 GKE Dataplane V2 的集群。无论是在新集群还是现有集群中,您都可以启用 GKE Dataplane V2 可观测性工具。

创建启用了可观测性的 Autopilot 集群

如需创建启用了 GKE Dataplane V2 可观测性的 GKE Autopilot 集群,请执行以下操作:

gcloud container clusters create-auto CLUSTER_NAME \
    --enable-dataplane-v2-flow-observability

CLUSTER_NAME 替换为您的集群名称。

创建启用了可观测性的 Standard 集群

如需创建启用了 GKE Dataplane V2 可观测性的 GKE Standard 集群,请执行以下操作:

gcloud container clusters create CLUSTER_NAME \
    --enable-dataplane-v2 \
    --enable-ip-alias \
    --enable-dataplane-v2-flow-observability

CLUSTER_NAME 替换为您的集群名称。

在现有集群上启用 GKE Dataplane V2 可观测性工具

如需在现有集群上启用 GKE Dataplane V2 可观测性,请运行以下命令:

gcloud container clusters update CLUSTER_NAME \
    --enable-dataplane-v2-flow-observability

CLUSTER_NAME 替换为您的集群名称。

停用 GKE Dataplane V2 可观测性工具

如需在现有集群上停用 GKE Dataplane V2 可观测性工具,请运行以下命令:

gcloud container clusters update CLUSTER_NAME \
    --disable-dataplane-v2-flow-observability

CLUSTER_NAME 替换为您的集群名称。

如何使用 Hubble CLI

启用 GKE Dataplane V2 可观测性功能后,在集群上使用 Hubble CLI 工具。

  1. 定义 hubble-cli 二进制文件的别名:

    alias hubble="kubectl exec -it -n gke-managed-dpv2-observability deployment/hubble-relay -c hubble-cli -- hubble"
    
  2. 如需检查 Hubble 状态,请在启用 GKE Dataplane V2 可观测性功能后,在所有 Autopilot 集群中使用 Hubble CLI:

    hubble status
    
  3. 如需查看当前流量,请按如下方式使用 Hubble CLI:

    hubble observe
    

如何部署 Hubble 界面二进制文件发行版

启用 GKE Dataplane V2 可观测性后,您可以部署开源 Hubble 界面。

  1. 在 GKE 集群中启用可观测性:

    1. 创建启用了可观测性的 GKE 集群:

      gcloud container clusters create-auto hubble-rc-auto \
          --location COMPUTE_LOCATION \
          --cluster-version VERSION \
          --enable-dataplane-v2-flow-observability
      

      替换以下内容:

    2. 您也可以在现有集群中启用可观测性:

      gcloud container clusters update CLUSTER_NAME \
          --location COMPUTE_LOCATION \
          --enable-dataplane-v2-flow-observability
      

      请替换以下内容:

  2. 配置 kubectl 以连接到集群:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --location COMPUTE_LOCATION
    

    替换

  3. 部署 Hubble 界面:

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: hubble-ui
      namespace: gke-managed-dpv2-observability
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: hubble-ui
      labels:
        app.kubernetes.io/part-of: cilium
    rules:
      - apiGroups:
          - networking.k8s.io
        resources:
          - networkpolicies
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - componentstatuses
          - endpoints
          - namespaces
          - nodes
          - pods
          - services
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - apiextensions.k8s.io
        resources:
          - customresourcedefinitions
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - cilium.io
        resources:
          - "*"
        verbs:
          - get
          - list
          - watch
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: hubble-ui
      labels:
        app.kubernetes.io/part-of: cilium
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: hubble-ui
    subjects:
      - kind: ServiceAccount
        name: hubble-ui
        namespace: gke-managed-dpv2-observability
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: hubble-ui-nginx
      namespace: gke-managed-dpv2-observability
    data:
      nginx.conf: |
        server {
            listen       8081;
            # uncomment for IPv6
            # listen       [::]:8081;
            server_name  localhost;
            root /app;
            index index.html;
            client_max_body_size 1G;
            location / {
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                # CORS
                add_header Access-Control-Allow-Methods "GET, POST, PUT, HEAD, DELETE, OPTIONS";
                add_header Access-Control-Allow-Origin *;
                add_header Access-Control-Max-Age 1728000;
                add_header Access-Control-Expose-Headers content-length,grpc-status,grpc-message;
                add_header Access-Control-Allow-Headers range,keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout;
                if ($request_method = OPTIONS) {
                    return 204;
                }
                # /CORS
                location /api {
                    proxy_http_version 1.1;
                    proxy_pass_request_headers on;
                    proxy_hide_header Access-Control-Allow-Origin;
                    proxy_pass http://127.0.0.1:8090;
                }
                location / {
                    # double `/index.html` is required here
                    try_files $uri $uri/ /index.html /index.html;
                }
            }
        }
    ---
    kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: hubble-ui
      namespace: gke-managed-dpv2-observability
      labels:
        k8s-app: hubble-ui
        app.kubernetes.io/name: hubble-ui
        app.kubernetes.io/part-of: cilium
    spec:
      replicas: 1
      selector:
        matchLabels:
          k8s-app: hubble-ui
      template:
        metadata:
          labels:
            k8s-app: hubble-ui
            app.kubernetes.io/name: hubble-ui
            app.kubernetes.io/part-of: cilium
        spec:
          securityContext:
            fsGroup: 1000
            seccompProfile:
              type: RuntimeDefault
          serviceAccount: hubble-ui
          serviceAccountName: hubble-ui
          containers:
            - name: frontend
              image: quay.io/cilium/hubble-ui:v0.11.0
              ports:
                - name: http
                  containerPort: 8081
              volumeMounts:
                - name: hubble-ui-nginx-conf
                  mountPath: /etc/nginx/conf.d/default.conf
                  subPath: nginx.conf
                - name: tmp-dir
                  mountPath: /tmp
              terminationMessagePolicy: FallbackToLogsOnError
              securityContext:
                allowPrivilegeEscalation: false
                readOnlyRootFilesystem: true
                runAsUser: 1000
                runAsGroup: 1000
                capabilities:
                  drop:
                    - all
            - name: backend
              image: quay.io/cilium/hubble-ui-backend:v0.11.0
              env:
                - name: EVENTS_SERVER_PORT
                  value: "8090"
                - name: FLOWS_API_ADDR
                  value: "hubble-relay.gke-managed-dpv2-observability.svc:443"
                - name: TLS_TO_RELAY_ENABLED
                  value: "true"
                - name: TLS_RELAY_SERVER_NAME
                  value: relay.gke-managed-dpv2-observability.svc.cluster.local
                - name: TLS_RELAY_CA_CERT_FILES
                  value: /var/lib/hubble-ui/certs/hubble-relay-ca.crt
                - name: TLS_RELAY_CLIENT_CERT_FILE
                  value: /var/lib/hubble-ui/certs/client.crt
                - name: TLS_RELAY_CLIENT_KEY_FILE
                  value: /var/lib/hubble-ui/certs/client.key
              ports:
                - name: grpc
                  containerPort: 8090
              volumeMounts:
                - name: hubble-ui-client-certs
                  mountPath: /var/lib/hubble-ui/certs
                  readOnly: true
              terminationMessagePolicy: FallbackToLogsOnError
              securityContext:
                allowPrivilegeEscalation: false
                readOnlyRootFilesystem: true
                runAsUser: 1000
                runAsGroup: 1000
                capabilities:
                  drop:
                    - all
          volumes:
            - configMap:
                defaultMode: 420
                name: hubble-ui-nginx
              name: hubble-ui-nginx-conf
            - emptyDir: {}
              name: tmp-dir
            - name: hubble-ui-client-certs
              projected:
                # note: the leading zero means this number is in octal representation: do not remove it
                defaultMode: 0400
                sources:
                  - secret:
                      name: hubble-relay-client-certs
                      items:
                        - key: ca.crt
                          path: hubble-relay-ca.crt
                        - key: tls.crt
                          path: client.crt
                        - key: tls.key
                          path: client.key
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: hubble-ui
      namespace: gke-managed-dpv2-observability
      labels:
        k8s-app: hubble-ui
        app.kubernetes.io/name: hubble-ui
        app.kubernetes.io/part-of: cilium
    spec:
      type: ClusterIP
      selector:
        k8s-app: hubble-ui
      ports:
        - name: http
          port: 80
          targetPort: 8081
    
  4. 应用 hubble-ui-128.yaml 清单:

    kubectl apply -f hubble-ui-128.yaml
    
  5. 通过端口转发公开 Service:

    kubectl -n gke-managed-dpv2-observability port-forward service/hubble-ui 16100:80 --address='0.0.0.0'
    
  6. 在网络浏览器中访问 Hubble 界面:

    http://localhost:16100/

后续步骤