在 GKE 透過 Prometheus 觀測應用程式


本教學課程說明如何使用開放原始碼 Prometheus,為部署至 Google Kubernetes Engine (GKE) 的應用程式微服務設定存活探查。

本教學課程使用開放原始碼的 Prometheus。不過,每個 GKE Autopilot 叢集都會自動部署 Managed Service for Prometheus,這是 Prometheus 指標的Google Cloud全代管多雲跨專案解決方案。Managed Service for Prometheus 可讓您使用 Prometheus 監控世界各地的部署項目並接收快訊,而且無須大規模管理及操作 Prometheus。

您也可以使用 Grafana 等開放原始碼工具,將 Prometheus 收集的指標視覺化。

目標

  • 建立叢集。
  • 部署 Prometheus
  • 部署範例應用程式「Bank of Anthos」
  • 設定 Prometheus liveness 探測。
  • 設定 Prometheus 快訊。
  • 設定 Alertmanager,在 Slack 頻道中接收通知。
  • 模擬服務中斷,測試 Prometheus。

費用

在本文件中,您會使用 Google Cloud的下列計費元件:

如要根據預測用量估算費用,請使用 Pricing Calculator

初次使用 Google Cloud 的使用者可能符合免費試用資格。

完成本文所述工作後,您可以刪除已建立的資源,避免繼續計費。詳情請參閱清除所用資源一節。

事前準備

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the GKE API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the GKE API.

    Enable the API

  8. 安裝 Helm API

準備環境

在本教學課程中,您將使用 Cloud Shell 管理Google Cloud上託管的資源。

  1. 設定預設環境變數:

    gcloud config set project PROJECT_ID
    gcloud config set compute/region COMPUTE_REGION
    

    更改下列內容:

    • PROJECT_ID:您的 Google Cloud 專案 ID
    • PROJECT_ID:叢集的 Compute Engine 區域。在本教學課程中,區域為 us-central1。通常會希望將函式部署到靠近您所在位置的區域。
  2. 複製本教學課程中使用的範例存放區:

    git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
    cd bank-of-anthos/
    
  3. 建立叢集:

    gcloud container clusters create-auto CLUSTER_NAME \
        --release-channel=CHANNEL_NAME \
        --region=COMPUTE_REGION
    

    更改下列內容:

    • CLUSTER_NAME:新叢集的名稱。
    • CHANNEL_NAME發布版本的名稱。

部署 Prometheus

使用範例 Helm 資訊套件安裝 Prometheus:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install tutorial bitnami/kube-prometheus \
    --version 8.2.2 \
    --values extras/prometheus/oss/values.yaml \
    --wait

這項指令會安裝 Prometheus 和下列元件:

  • Prometheus Operator: 部署及設定開放原始碼 Prometheus 的熱門方式。
  • Alertmanager: 處理 Prometheus 伺服器傳送的快訊,並將快訊轉送至應用程式, 例如 Slack。
  • Blackbox 匯出工具: 可讓 Prometheus 使用 HTTP、HTTPS、DNS、TCP、ICMP 和 gRPC 探查端點。

部署 Bank of Anthos

部署 Bank of Anthos 範例應用程式:

kubectl apply -f extras/jwt/jwt-secret.yaml
kubectl apply -f kubernetes-manifests

Slack 通知

如要設定 Slack 通知,您必須建立 Slack 應用程式、啟用應用程式的傳入 Webhook,並將應用程式安裝至 Slack 工作區。

建立 Slack 應用程式

  1. 加入 Slack 工作區,方法是註冊電子郵件或使用工作區管理員傳送的邀請。

  2. 使用工作區名稱和 Slack 帳戶憑證登入 Slack

  3. 建立新的 Slack 應用程式

    1. 在「建立應用程式」對話方塊中,按一下「從頭開始」
    2. 指定「應用程式名稱」,然後選擇 Slack 工作區。
    3. 點選「建立應用程式」
    4. 在「新增功能」下方,按一下「連入的 Webhook」
    5. 按一下「啟用連入的 Webhook」切換按鈕。
    6. 在「Webhook URLs for Your Workspace」部分,按一下「Add New Webhook to Workspace」
    7. 在隨即開啟的授權頁面中,選取要接收通知的頻道。
    8. 按一下「Allow」
    9. Slack 應用程式的 Webhook 會顯示在「Webhook URLs for Your Workspace」(工作區的 Webhook 網址) 部分。請儲存這個網址,稍後會用到。

設定 Alertmanager

建立 Kubernetes 密鑰以儲存 Webhook URL:

kubectl create secret generic alertmanager-slack-webhook --from-literal webhookURL=SLACK_WEBHOOK_URL
kubectl apply -f extras/prometheus/oss/alertmanagerconfig.yaml

SLACK_WEBHOOK_URL 替換成上一節的 Webhook 網址。

設定 Prometheus

  1. 請查看下列資訊清單:

    # Copyright 2023 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: frontend-probe
    spec:
      jobName: frontend
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - frontend:80
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: userservice-probe
    spec:
      jobName: userservice
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - userservice:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: balancereader-probe
    spec:
      jobName: balancereader
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - balancereader:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: contacts-probe
    spec:
      jobName: contacts
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - contacts:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: ledgerwriter-probe
    spec:
      jobName: ledgerwriter
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - ledgerwriter:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: transactionhistory-probe
    spec:
      jobName: transactionhistory
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - transactionhistory:8080/ready
    

    這份資訊清單說明 Prometheus 活躍性探測結果,並包含下列欄位:

    • spec.jobName:指派給已擷取指標的工作名稱。
    • spec.prober.url:黑箱匯出工具的服務網址。包括 Helm 圖表中定義的 Blackbox 匯出工具預設通訊埠。
    • spec.prober.path:指標集合路徑。
    • spec.targets.staticConfig.labels:從目標擷取的所有指標所指派的標籤。
    • spec.targets.staticConfig.static:要探查的主機清單。
  2. 將資訊清單套用至叢集:

    kubectl apply -f extras/prometheus/oss/probes.yaml
    
  3. 請查看下列資訊清單:

    # Copyright 2023 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: uptime-rule
    spec:
      groups:
      - name: Micro services uptime
        interval: 60s
        rules:
        - alert: BalancereaderUnavaiable
          expr: probe_success{app="bank-of-anthos",job="balancereader"} == 0
          for: 1m
          annotations:
            summary: Balance Reader Service is unavailable
            description: Check Balance Reader pods and it's logs
          labels:
            severity: 'critical'
        - alert: ContactsUnavaiable
          expr: probe_success{app="bank-of-anthos",job="contacts"} == 0
          for: 1m
          annotations:
            summary: Contacs Service is unavailable
            description: Check Contacs pods and it's logs
          labels:
            severity: 'warning'
        - alert: FrontendUnavaiable
          expr: probe_success{app="bank-of-anthos",job="frontend"} == 0
          for: 1m
          annotations:
            summary: Frontend Service is unavailable
            description: Check Frontend pods and it's logs
          labels:
            severity: 'critical'
        - alert: LedgerwriterUnavaiable
          expr: probe_success{app="bank-of-anthos",job="ledgerwriter"} == 0
          for: 1m
          annotations:
            summary: Ledger Writer Service is unavailable
            description: Check Ledger Writer pods and it's logs
          labels:
            severity: 'critical'
        - alert: TransactionhistoryUnavaiable
          expr: probe_success{app="bank-of-anthos",job="transactionhistory"} == 0
          for: 1m
          annotations:
            summary: Transaction History Service is unavailable
            description: Check Transaction History pods and it's logs
          labels:
            severity: 'critical'
        - alert: UserserviceUnavaiable
          expr: probe_success{app="bank-of-anthos",job="userservice"} == 0
          for: 1m
          annotations:
            summary: User Service is unavailable
            description: Check User Service pods and it's logs
          labels:
            severity: 'critical'
    

    這個資訊清單說明 PrometheusRule,並包含下列欄位:

    • spec.groups.[*].name:規則群組的名稱。
    • spec.groups.[*].interval:評估群組中規則的頻率。
    • spec.groups.[*].rules[*].alert:快訊名稱。
    • spec.groups.[*].rules[*].expr:要評估的 PromQL 運算式。
    • spec.groups.[*].rules[*].for:快訊必須回報的時間長度,系統才會視為觸發快訊。
    • spec.groups.[*].rules[*].annotations:要新增至各項快訊的註解清單。這項設定僅適用於快訊規則。
    • spec.groups.[*].rules[*].labels:要新增或覆寫的標籤。
  4. 將資訊清單套用至叢集:

    kubectl apply -f extras/prometheus/oss/rules.yaml
    

模擬服務中斷

  1. contacts Deployment 縮減為零,模擬服務中斷情形:

    kubectl scale deployment contacts --replicas 0
    

    Slack 工作區頻道中應該會顯示通知訊息。GKE 最多可能需要 5 分鐘才能調整 Deployment 的規模。

  2. 還原 contacts Deployment:

    kubectl scale deployment contacts --replicas 1
    

    您應該會在 Slack 工作區管道中看到快訊解決通知訊息。GKE 最多可能需要 5 分鐘才能擴大 Deployment。

清除所用資源

如要避免系統向您的 Google Cloud 帳戶收取本教學課程中所用資源的相關費用,請刪除含有該項資源的專案,或者保留專案但刪除個別資源。

刪除專案

    Delete a Google Cloud project:

    gcloud projects delete PROJECT_ID

刪除個別資源

  1. 刪除 Kubernetes 資源:

    kubectl delete -f kubernetes-manifests
    
  2. 解除安裝 Prometheus:

    helm uninstall tutorial
    
  3. 刪除 GKE 叢集:

    gcloud container clusters delete CLUSTER_NAME --quiet
    

後續步驟