GKE で Prometheus を使用したアプリケーションのオブザーバビリティ

Autopilot

このチュートリアルでは、オープンソースの Prometheus を使用して、Google Kubernetes Engine（GKE）にデプロイされたアプリケーションのマイクロサービスに livenessProbe を設定する方法について説明します。

このチュートリアルでは、オープンソースの Prometheus を使用します。ただし、各 GKE Autopilot クラスタは、Prometheus 指標向けの Google Cloud のフルマネージド、マルチクラウド、クロスプロジェクトソリューションである Managed Service for Prometheus を自動的にデプロイします。Google Cloud Managed Service for Prometheus により、Prometheus を大規模に手動で管理、運用しなくても、Prometheus を使用してワークロードのモニタリングや、アラートの送信を行うことができます。

Grafana などのオープンソースツールを使用して、Prometheus によって収集された指標を可視化することもできます。

目標

クラスタを作成します。
Prometheus をデプロイします。
サンプルアプリケーションの Bank of Anthos をデプロイします。
Prometheus livenessProbe を構成します。
Prometheus アラートを構成します。
Slack チャンネルで通知を受信するように Alertmanager を構成します。
停止をシミュレートして Prometheus をテストします。

費用

このドキュメントでは、Google Cloud の次の課金対象のコンポーネントを使用します。

料金計算ツールを使うと、予想使用量に基づいて費用の見積もりを生成できます。新しい Google Cloud ユーザーは無料トライアルをご利用いただける場合があります。

このドキュメントに記載されているタスクの完了後、作成したリソースを削除すると、それ以上の請求は発生しません。詳細については、クリーンアップをご覧ください。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Google Cloud Console のプロジェクトセレクタのページで、[プロジェクトを作成] をクリックして新しい Google Cloud プロジェクトの作成を開始します。

プロジェクトセレクタに移動

Google Cloud プロジェクトで課金が有効になっていることを確認します。

GKE API を有効にします。

API を有効にする

Google Cloud Console のプロジェクトセレクタのページで、[プロジェクトを作成] をクリックして新しい Google Cloud プロジェクトの作成を開始します。

プロジェクトセレクタに移動

Google Cloud プロジェクトで課金が有効になっていることを確認します。

GKE API を有効にします。

API を有効にする

Helm API をインストールします。

環境を準備する

このチュートリアルでは、Cloud Shell を使用して Google Cloud でホストされるリソースを管理します。

デフォルトの環境変数を設定します。
```
gcloud config set project PROJECT_ID
gcloud config set compute/region COMPUTE_REGION
```
次のように置き換えます。
- PROJECT_ID: Google Cloud のプロジェクト ID。
- PROJECT_ID: クラスタの Compute Engine のリージョン。このチュートリアルでは、リージョンは us-central1 です。通常、近くのリージョンがおすすめです。
このチュートリアルで使用するサンプルリポジトリのクローンを作成します。
```
git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
cd bank-of-anthos/
```
クラスタを作成します。
```
gcloud container clusters create-auto CLUSTER_NAME \
    --release-channel=CHANNEL_NAME \
    --region=COMPUTE_REGION
```
次のように置き換えます。
- CLUSTER_NAME: 新しいクラスタの名前。
- CHANNEL_NAME: リリースチャンネルの名前。

Prometheus をデプロイする

サンプル Helm チャートを使用して Prometheus をインストールします。

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install tutorial bitnami/kube-prometheus \
    --version 8.2.2 \
    --values extras/prometheus/oss/values.yaml \
    --wait

このコマンドを実行すると、Prometheus が次のコンポーネントとともにインストールされます。

Prometheus Operator: オープンソースの Prometheus をデプロイ、構成します。
Alertmanager: Prometheus サーバーから送信されたアラートを処理し、それらを Slack などのアプリケーションにルーティングします。
ブラックボックスエクスポータ: HTTP、HTTPS、DNS、TCP、ICMP、gRPC を使用して Prometheus にエンドポイントをプローブさせます。

Bank of Anthos をデプロイする

Bank of Anthos サンプルアプリケーションをデプロイします。

kubectl apply -f extras/jwt/jwt-secret.yaml
kubectl apply -f kubernetes-manifests

Slack 通知

Slack 通知を設定するには、Slack アプリケーションを作成し、アプリケーションの着信 Webhook を有効にして、Slack ワークスペースにアプリケーションをインストールする必要があります。

Slack アプリケーションの作成

メールアドレスで登録するか、またはワークスペース管理者から送信された招待状を利用して、Slack ワークスペースに参加します。
注: Slack ワークスペース管理者でない場合は、ワークスペースにアプリがデプロイされる前にワークスペース管理者の承認が必要になります。
ワークスペース名と Slack アカウントの認証情報を使用して、Slack にログインします。
新しい Slack アプリを作成します。
1. [Create an app] ダイアログで [From scratch] をクリックします。
2. アプリ名を指定し、Slack ワークスペースを選択します。
3. [Create App] をクリックします。
4. [Add Features and Features] で、[Incoming Webhooks] をクリックします。
5. [Activate Incoming Webhooks] の切り替えボタンをクリックします。
6. [Webhook URLs for Your Workspace] セクションで、[Add New Webhook to Workspace] をクリックします。
7. 表示された認証ページで、通知を受け取るチャネルを選択します。
8. [Allow] をクリックします。
9. [Webhook URLs for Your Workspace] セクションに Slack アプリケーションの Webhook が表示されます。後で使用できるように、URL を保存します。

Alertmanager を構成する

Webhook URL を格納する Kubernetes Secret を作成します。

kubectl create secret generic alertmanager-slack-webhook --from-literal webhookURL=SLACK_WEBHOOK_URL
kubectl apply -f extras/prometheus/oss/alertmanagerconfig.yaml

SLACK_WEBHOOK_URL は、前のセクションの Webhook の URL に置き換えます。

Prometheus を構成する

次のマニフェストを確認します。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: frontend-probe
spec:
  jobName: frontend
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - frontend:80
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: userservice-probe
spec:
  jobName: userservice
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - userservice:8080/ready
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: balancereader-probe
spec:
  jobName: balancereader
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - balancereader:8080/ready
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: contacts-probe
spec:
  jobName: contacts
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - contacts:8080/ready
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: ledgerwriter-probe
spec:
  jobName: ledgerwriter
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - ledgerwriter:8080/ready
---
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: transactionhistory-probe
spec:
  jobName: transactionhistory
  prober:
    url: tutorial-kube-prometheus-blackbox-exporter:19115
    path: /probe
  module: http_2xx
  interval: 60s
  scrapeTimeout: 30s
  targets:
    staticConfig:
      labels:
        app: bank-of-anthos
      static:
        - transactionhistory:8080/ready

このマニフェストでは、Prometheus livenessProbe を記述しています。次のフィールドを使用しています。

spec.jobName: スクレイピングされた指標に割り当てられたジョブ名。
spec.prober.url: ブラックボックスエクスポータのサービス URL。これには、Helm チャートで定義されているブラックボックスエクスポータのデフォルトポートが含まれます。
spec.prober.path: 指標収集のパス。
spec.targets.staticConfig.labels: ターゲットからスクレイピングされたすべての指標に割り当てられたラベル。
spec.targets.staticConfig.static: プローブするホストのリスト。

マニフェストをクラスタに適用します。
```
kubectl apply -f extras/prometheus/oss/probes.yaml
```

次のマニフェストを確認します。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: uptime-rule
spec:
  groups:
  - name: Micro services uptime
    interval: 60s
    rules:
    - alert: BalancereaderUnavaiable
      expr: probe_success{app="bank-of-anthos",job="balancereader"} == 0
      for: 1m
      annotations:
        summary: Balance Reader Service is unavailable
        description: Check Balance Reader pods and it's logs
      labels:
        severity: 'critical'
    - alert: ContactsUnavaiable
      expr: probe_success{app="bank-of-anthos",job="contacts"} == 0
      for: 1m
      annotations:
        summary: Contacs Service is unavailable
        description: Check Contacs pods and it's logs
      labels:
        severity: 'warning'
    - alert: FrontendUnavaiable
      expr: probe_success{app="bank-of-anthos",job="frontend"} == 0
      for: 1m
      annotations:
        summary: Frontend Service is unavailable
        description: Check Frontend pods and it's logs
      labels:
        severity: 'critical'
    - alert: LedgerwriterUnavaiable
      expr: probe_success{app="bank-of-anthos",job="ledgerwriter"} == 0
      for: 1m
      annotations:
        summary: Ledger Writer Service is unavailable
        description: Check Ledger Writer pods and it's logs
      labels:
        severity: 'critical'
    - alert: TransactionhistoryUnavaiable
      expr: probe_success{app="bank-of-anthos",job="transactionhistory"} == 0
      for: 1m
      annotations:
        summary: Transaction History Service is unavailable
        description: Check Transaction History pods and it's logs
      labels:
        severity: 'critical'
    - alert: UserserviceUnavaiable
      expr: probe_success{app="bank-of-anthos",job="userservice"} == 0
      for: 1m
      annotations:
        summary: User Service is unavailable
        description: Check User Service pods and it's logs
      labels:
        severity: 'critical'

このマニフェストでは PrometheusRule を記述しています。次のフィールドを使用しています。

spec.groups.[*].name: ルールグループの名前。
spec.groups.[*].interval: グループ内のルールが評価される頻度。
spec.groups.[*].rules[*].alert: アラートの名前。
spec.groups.[*].rules[*].expr: 評価する PromQL 式。
spec.groups.[*].rules[*].for: アラートを発生させるまでの時間。
spec.groups.[*].rules[*].annotations: 各アラートに追加するアノテーションのリスト。これはアラートルールでのみ有効です。
spec.groups.[*].rules[*].labels: 追加または上書きするラベル。

マニフェストをクラスタに適用します。
```
kubectl apply -f extras/prometheus/oss/rules.yaml
```

停止のシミュレーションを行う

contacts Deployment をゼロにスケーリングして、停止をシミュレートします。
```
kubectl scale deployment contacts --replicas 0
```
Slack ワークスペースチャンネルに通知メッセージが表示されます。GKE が Deployment をスケーリングするまでに 5 分ほどかかることがあります。
contacts Deployment を復元します。
```
kubectl scale deployment contacts --replicas 1
```
Slack ワークスペースチャンネルにアラート解決通知メッセージが表示されます。GKE が Deployment をスケーリングするまでに 5 分ほどかかることがあります。

クリーンアップ

このチュートリアルで使用したリソースについて、Google Cloud アカウントに課金されないようにするには、リソースを含むプロジェクトを削除するか、プロジェクトを維持して個々のリソースを削除します。

プロジェクトの削除

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにすることができます。

Google Cloud プロジェクトを削除します。

gcloud projects delete PROJECT_ID

リソースを個別に削除する

Kubernetes リソースを削除します。
```
kubectl delete -f kubernetes-manifests
```
Prometheus をアンインストールします。
```
helm uninstall tutorial
```

GKE クラスタを削除します。

gcloud container clusters delete CLUSTER_NAME --quiet

次のステップ

Google Cloud Managed Service for Prometheus について学習する。これは、Prometheus に基づくフルマネージドのグローバル指標ソリューションで、すべての Autopilot クラスタにデフォルトでデプロイされます。
Google Cloud に関するリファレンスアーキテクチャ、図、ベストプラクティスを確認する。Cloud アーキテクチャセンターをご覧ください。