在 GKE 透過 Prometheus 觀測應用程式


本教學課程說明如何使用開放原始碼 Prometheus,為部署至 Google Kubernetes Engine (GKE) 的應用程式微服務設定存活探查。

本教學課程使用開放原始碼的 Prometheus。不過,每個 GKE Autopilot 叢集都會自動部署 Managed Service for Prometheus,這是 Prometheus 指標的Google Cloud全代管多雲跨專案解決方案。Managed Service for Prometheus 可讓您使用 Prometheus 監控世界各地的部署項目並接收快訊,而且無須大規模管理及操作 Prometheus。

您也可以使用 Grafana 等開放原始碼工具,將 Prometheus 收集的指標視覺化。

目標

  • 建立叢集。
  • 部署 Prometheus
  • 部署範例應用程式「Bank of Anthos」
  • 設定 Prometheus liveness 探測。
  • 設定 Prometheus 快訊。
  • 設定 Alertmanager,在 Slack 頻道中接收通知。
  • 模擬服務中斷,測試 Prometheus。

費用

在本文件中,您會使用 Google Cloud的下列計費元件:

如要根據預測用量估算費用,請使用 Pricing Calculator

初次使用 Google Cloud 的使用者可能符合免費試用資格。

完成本文所述工作後,您可以刪除已建立的資源,避免繼續計費。詳情請參閱清除所用資源一節。

事前準備

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the GKE API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the GKE API.

    Enable the API

  8. 安裝 Helm API
  9. 準備環境

    在本教學課程中,您將使用 Cloud Shell 管理Google Cloud上託管的資源。

    1. 設定預設環境變數:

      gcloud config set project PROJECT_ID
      gcloud config set compute/region COMPUTE_REGION
      

      更改下列內容:

      • PROJECT_ID:您的 Google Cloud 專案 ID
      • PROJECT_ID:叢集的 Compute Engine 區域。在本教學課程中,區域為 us-central1。通常會希望將函式部署到靠近您所在位置的區域。
    2. 複製本教學課程中使用的範例存放區:

      git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
      cd bank-of-anthos/
      
    3. 建立叢集:

      gcloud container clusters create-auto CLUSTER_NAME \
          --release-channel=CHANNEL_NAME \
          --region=COMPUTE_REGION
      

      更改下列內容:

      • CLUSTER_NAME:新叢集的名稱。
      • CHANNEL_NAME發布版本的名稱。

    部署 Prometheus

    使用範例 Helm 資訊套件安裝 Prometheus:

    helm repo add bitnami https://charts.bitnami.com/bitnami
    helm install tutorial bitnami/kube-prometheus \
        --version 8.2.2 \
        --values extras/prometheus/oss/values.yaml \
        --wait
    

    這項指令會安裝 Prometheus 和下列元件:

    • Prometheus Operator: 部署及設定開放原始碼 Prometheus 的熱門方式。
    • Alertmanager: 處理 Prometheus 伺服器傳送的快訊,並將快訊轉送至應用程式, 例如 Slack。
    • Blackbox 匯出工具: 可讓 Prometheus 使用 HTTP、HTTPS、DNS、TCP、ICMP 和 gRPC 探查端點。

    部署 Bank of Anthos

    部署 Bank of Anthos 範例應用程式:

    kubectl apply -f extras/jwt/jwt-secret.yaml
    kubectl apply -f kubernetes-manifests
    

    Slack 通知

    如要設定 Slack 通知,您必須建立 Slack 應用程式、為該應用程式啟用「傳入 Webhook」,並將應用程式安裝至 Slack 工作區。

    建立 Slack 應用程式

    1. 加入 Slack 工作區,方法是註冊電子郵件或使用工作區管理員傳送的邀請。

    2. 使用工作區名稱和 Slack 帳戶憑證登入 Slack

    3. 建立新的 Slack 應用程式

      1. 在「建立應用程式」對話方塊中,按一下「從頭開始」
      2. 指定「應用程式名稱」,然後選擇 Slack 工作區。
      3. 點選「建立應用程式」
      4. 在「新增功能」下方,按一下「連入的 Webhook」
      5. 按一下「啟用連入的 Webhook」切換按鈕。
      6. 在「Webhook URLs for Your Workspace」部分,按一下「Add New Webhook to Workspace」
      7. 在隨即開啟的授權頁面中,選取要接收通知的頻道。
      8. 按一下「Allow」
      9. Slack 應用程式的 Webhook 會顯示在「Webhook URLs for Your Workspace」(工作區的 Webhook 網址) 部分。請儲存這個網址,稍後會用到。

    設定 Alertmanager

    建立 Kubernetes 密鑰以儲存 Webhook URL:

    kubectl create secret generic alertmanager-slack-webhook --from-literal webhookURL=SLACK_WEBHOOK_URL
    kubectl apply -f extras/prometheus/oss/alertmanagerconfig.yaml
    

    SLACK_WEBHOOK_URL 替換成上一節的 Webhook 網址。

    設定 Prometheus

    1. 請查看下列資訊清單:

      # Copyright 2023 Google LLC
      #
      # Licensed under the Apache License, Version 2.0 (the "License");
      # you may not use this file except in compliance with the License.
      # You may obtain a copy of the License at
      #
      #      http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: frontend-probe
      spec:
        jobName: frontend
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - frontend:80
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: userservice-probe
      spec:
        jobName: userservice
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - userservice:8080/ready
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: balancereader-probe
      spec:
        jobName: balancereader
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - balancereader:8080/ready
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: contacts-probe
      spec:
        jobName: contacts
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - contacts:8080/ready
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: ledgerwriter-probe
      spec:
        jobName: ledgerwriter
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - ledgerwriter:8080/ready
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: Probe
      metadata:
        name: transactionhistory-probe
      spec:
        jobName: transactionhistory
        prober:
          url: tutorial-kube-prometheus-blackbox-exporter:19115
          path: /probe
        module: http_2xx
        interval: 60s
        scrapeTimeout: 30s
        targets:
          staticConfig:
            labels:
              app: bank-of-anthos
            static:
              - transactionhistory:8080/ready
      

      這份資訊清單說明 Prometheus 活躍性探測結果,並包含下列欄位:

      • spec.jobName:指派給已擷取指標的工作名稱。
      • spec.prober.url:黑箱匯出工具的服務網址。包括 Helm 圖表中定義的 Blackbox 匯出工具預設通訊埠。
      • spec.prober.path:指標集合路徑。
      • spec.targets.staticConfig.labels:從目標擷取的指標所指派的標籤。
      • spec.targets.staticConfig.static:要探查的主機清單。
    2. 將資訊清單套用至叢集:

      kubectl apply -f extras/prometheus/oss/probes.yaml
      
    3. 請查看下列資訊清單:

      # Copyright 2023 Google LLC
      #
      # Licensed under the Apache License, Version 2.0 (the "License");
      # you may not use this file except in compliance with the License.
      # You may obtain a copy of the License at
      #
      #      http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        name: uptime-rule
      spec:
        groups:
        - name: Micro services uptime
          interval: 60s
          rules:
          - alert: BalancereaderUnavaiable
            expr: probe_success{app="bank-of-anthos",job="balancereader"} == 0
            for: 1m
            annotations:
              summary: Balance Reader Service is unavailable
              description: Check Balance Reader pods and it's logs
            labels:
              severity: 'critical'
          - alert: ContactsUnavaiable
            expr: probe_success{app="bank-of-anthos",job="contacts"} == 0
            for: 1m
            annotations:
              summary: Contacs Service is unavailable
              description: Check Contacs pods and it's logs
            labels:
              severity: 'warning'
          - alert: FrontendUnavaiable
            expr: probe_success{app="bank-of-anthos",job="frontend"} == 0
            for: 1m
            annotations:
              summary: Frontend Service is unavailable
              description: Check Frontend pods and it's logs
            labels:
              severity: 'critical'
          - alert: LedgerwriterUnavaiable
            expr: probe_success{app="bank-of-anthos",job="ledgerwriter"} == 0
            for: 1m
            annotations:
              summary: Ledger Writer Service is unavailable
              description: Check Ledger Writer pods and it's logs
            labels:
              severity: 'critical'
          - alert: TransactionhistoryUnavaiable
            expr: probe_success{app="bank-of-anthos",job="transactionhistory"} == 0
            for: 1m
            annotations:
              summary: Transaction History Service is unavailable
              description: Check Transaction History pods and it's logs
            labels:
              severity: 'critical'
          - alert: UserserviceUnavaiable
            expr: probe_success{app="bank-of-anthos",job="userservice"} == 0
            for: 1m
            annotations:
              summary: User Service is unavailable
              description: Check User Service pods and it's logs
            labels:
              severity: 'critical'
      

      這個資訊清單說明 PrometheusRule,並包含下列欄位:

      • spec.groups.[*].name:規則群組的名稱。
      • spec.groups.[*].interval:評估群組中規則的頻率。
      • spec.groups.[*].rules[*].alert:快訊名稱。
      • spec.groups.[*].rules[*].expr:要評估的 PromQL 運算式。
      • spec.groups.[*].rules[*].for:快訊必須回報的時間長度,系統才會視為啟動。
      • spec.groups.[*].rules[*].annotations:要新增至每項快訊的註解清單。這項設定僅適用於快訊規則。
      • spec.groups.[*].rules[*].labels:要新增或覆寫的標籤。
    4. 將資訊清單套用至叢集:

      kubectl apply -f extras/prometheus/oss/rules.yaml
      

    模擬服務中斷

    1. contacts Deployment 縮減為零,模擬服務中斷情形:

      kubectl scale deployment contacts --replicas 0
      

      Slack 工作區頻道中應該會顯示通知訊息。GKE 最多可能需要 5 分鐘才能調整 Deployment 的規模。

    2. 還原 contacts Deployment:

      kubectl scale deployment contacts --replicas 1
      

      您應該會在 Slack 工作區管道中看到快訊解決通知訊息。GKE 最多可能需要 5 分鐘才能擴大 Deployment。

    清除所用資源

    如要避免系統向您的 Google Cloud 帳戶收取本教學課程中所用資源的相關費用,請刪除含有該項資源的專案,或者保留專案但刪除個別資源。

    刪除專案

    1. In the Google Cloud console, go to the Manage resources page.

      Go to Manage resources

    2. In the project list, select the project that you want to delete, and then click Delete.
    3. In the dialog, type the project ID, and then click Shut down to delete the project.

    刪除個別資源

    1. 刪除 Kubernetes 資源:

      kubectl delete -f kubernetes-manifests
      
    2. 解除安裝 Prometheus:

      helm uninstall tutorial
      
    3. 刪除 GKE 叢集:

      gcloud container clusters delete CLUSTER_NAME --quiet
      

    後續步驟