Version 1.13. This version is no longer supported. For information about how to upgrade to version 1.14, see Upgrading Anthos on bare metal in the 1.14 documentation. For more information about supported and unsupported versions, see the Version history page in the latest documentation.

アラートポリシーの作成

このページでは、ベアメタル版 Anthos クラスタのアラートポリシーを作成する方法について説明します。

始める前に

アラートポリシーを作成するには、次の権限が必要です。

monitoring.alertPolicies.create
monitoring.alertPolicies.delete
monitoring.alertPolicies.update

次のいずれかのロールがある場合、これらの権限が付与されます。

monitoring.alertPolicyEditor
monitoring.editor
プロジェクト編集者
プロジェクト所有者

ロールを確認するには、Google Cloud Console の [IAM] ページに移動します。

ポリシーの作成: ベアメタル版 Anthos クラスタ API サーバーが使用できない

この演習では、クラスタの Kubernetes API サーバー用のアラートポリシーを作成します。このポリシーを設定すると、クラスタの API サーバーが使用不可になるたびに通知を受け取るように調整できます。

ポリシー構成ファイル apiserver-unavailable.json をダウンロードします。
ポリシーを作成するには:
```
gcloud alpha monitoring policies create --policy-from-file=POLICY_CONFIG
```
POLICY_CONFIG は、ダウンロードした構成ファイルのパスに置き換えます。

アラートポリシーを表示するには:

Console

Google Cloud Console で、[モニタリング] ページに移動します。

[モニタリング] に移動
左側にある [アラート] を選択します。
[Policies] に、アラートポリシーのリストが表示されます。

そのリストで [Anthos on baremetal cluster API server unavailable (critical)] を選択して、新しいポリシーの詳細を表示します。[Conditions] に、ポリシーの説明が表示されます。例:
```
Policy violates when ANY condition is met
Anthos on baremetal cluster API server uptime is absent
Anthos on baremetal cluster API server uptime is less than 99.99% per minute
```

gcloud

gcloud alpha monitoring policies list

出力に、ポリシーの詳細情報が表示されます。例:

combiner: OR
conditions:
- conditionAbsent:
    aggregations:
    - alignmentPeriod: 60s
      crossSeriesReducer: REDUCE_MEAN
      groupByFields:
      - resource.label.project_id
      - resource.label.location
      - resource.label.cluster_name
      - resource.label.namespace_name
      - resource.label.container_name
      - resource.label.pod_name
      perSeriesAligner: ALIGN_MAX
    duration: 300s
    filter: resource.type = "k8s_container" AND resource.labels.namespace_name = "kube-system"
      AND metric.type = "kubernetes.io/anthos/container/uptime" AND resource.label."container_name"=monitoring.regex.full_match("kube-apiserver")
    trigger:
      count: 1
  displayName: Anthos on baremetal cluster API server uptime is absent
  name: projects/…/alertPolicies/12404845535868002666/conditions/12404845535868003603
- conditionThreshold:
    aggregations:
    - alignmentPeriod: 120s
      crossSeriesReducer: REDUCE_MEAN
      groupByFields:
      - resource.label.project_id
      - resource.label.location
      - resource.label.cluster_name
      - resource.label.namespace_name
      - resource.label.container_name
      - resource.label.pod_name
      perSeriesAligner: ALIGN_MAX
    comparison: COMPARISON_LT
    duration: 300s
    filter: resource.type = "k8s_container" AND resource.labels.namespace_name = "kube-system"
      AND metric.type = "kubernetes.io/anthos/container/uptime" AND resource.label."container_name"=monitoring.regex.full_match("kube-apiserver")
    thresholdValue: 119.0
    trigger:
      count: 1
  displayName: Anthos on baremetal cluster API server uptime is less than 99.99% per
    minute
  name: projects/…/alertPolicies/12404845535868002666/conditions/12404845535868004540
creationRecord:
  mutateTime: …
  mutatedBy: …
displayName: Anthos on baremetal cluster API server unavailable (critical)
enabled: true
mutationRecord:
  mutateTime: …
  mutatedBy: …
name: projects/…/alertPolicies/12404845535868002666

アラートポリシーの追加作成

このセクションでは、推奨される一連のアラートポリシーの説明と構成ファイルについて説明します。

ポリシーを作成するには、上の演習と同じ手順を行います。

構成ファイルをダウンロードするには、右側の列のリンクをクリックします。
ポリシーを作成するには、gcloud alpha monitoring policies create を実行します。

次のスクリプトを使用して、このドキュメントで説明しているすべてのアラートポリシーサンプルをダウンロードしてインストールできます。

# 1. Create a directory named alert_samples:

mkdir alert_samples && cd alert_samples
declare -a alerts=("apiserver-unavailable.json" "scheduler-unavailable.json" "controller-manager-unavailable.json" "pod-crash-looping.json" "container-memory-usage-high-reaching-limit.json"
"container-cpu-usage-high-reaching-limit.json" "pod-not-ready-1h.json" "persistent-volume-usage-high.json" "node-not-ready-1h.json" "node-cpu-usage-high.json" "node-memory-usage-high.json"
"node-disk-usage-high.json" "api-server-error-ratio-10-percent.json" "api-server-error-ratio-5-percent.json" "etcd-leader-changes-too-frequent.json" "etcd-proposals-failed-too-frequent.json"
"etcd-server-not-in-quorum.json" "etcd-storage-usage-high.json")

# 2. Download all alert samples into the alert_samples/ directory:

for x in "${alerts[@]}"
do
  wget https://cloud.google.com/anthos/clusters/docs/bare-metal/1.13/samples/${x}
done

# 3. (optional) Uncomment and provide your project ID to set the default project
# for gcloud commands:

# gcloud config set project <PROJECT_ID>

# 4. Create alert policies for each of the downloaded samples:

for x in "${alerts[@]}"
do
  gcloud alpha monitoring policies create --policy-from-file=${x}
done

コントロールプレーンコンポーネントの可用性

アラート名	説明	Cloud Monitoring でのアラートポリシーの定義
Anthos on baremetal cluster API server unavailable (critical)	API サーバーが稼働していないか、稼働時間が 1 分あたり 99.99% 未満	apiserver-unavailable.json
Anthos on baremetal cluster scheduler unavailable (critical)	スケジューラが稼働していないか、稼働時間が 1 分あたり 99.99% 未満	scheduler-unavailable.json
Anthos on baremetal controller manager unavailable (critical)	コントローラマネージャーが指標のターゲットの検出に表示されなくなった	controller-manager-unavailable.json

Kubernetes システム

アラート名	説明	Cloud Monitoring でのアラートポリシーの定義
ベアメタル版 Anthos の Pod がクラッシュループしている（重大）	Pod が再起動したため、クラッシュループ状態になっている可能性があります	pod-crash-looping.json
ベアメタル版 Anthos のコンテナのメモリ使用量が 85% を超えている（警告）	コンテナのメモリ使用量が上限の 85% を超えている	container-memory-usage-high-reaching-limit.json
ベアメタル版 Anthos のコンテナの CPU 使用率が 80% を超えている（警告）	コンテナの CPU 使用率が上限の 80% を超えている	container-cpu-usage-high-reaching-limit.json
ベアメタル版 Anthos の Pod が 1 時間以上準備ができていない状態である（重大）	Pod が 1 時間以上まだ準備ができていない状態です	pod-not-ready-1h.json
ベアメタル版 Anthos の永続ボリュームの使用率が高い（重大）	申請済みの永続ボリュームの空きがないことが予想されます	persistent-volume-usage-high.json
ベアメタル版 Anthos のノードが 1 時間以上準備ができていない状態である（重大）	ノードが 1 時間以上まだ準備ができていない状態です	node-not-ready-1h.json
ベアメタル版 Anthos のノード CPU 使用量が 80% を超えている（重大）	ノードの CPU 使用量が 80% を超えている	node-cpu-usage-high.json
ベアメタル版 Anthos のノードメモリ使用量が 80% を超えている（重大）	ノードメモリ使用量が 80% を超えている	node-memory-usage-high.json
ベアメタル版 Anthos のノードディスク使用量が 80% を超えている（重大）	ノードのディスク使用量が 80% を超えている	node-disk-usage-high.json

Kubernetes パフォーマンス

アラート名	説明	Cloud Monitoring でのアラートポリシーの定義
ベアメタル版 Anthos の API サーバーのエラー率が 10% を超えている（重大）	API サーバーがリクエストの 10% を超えるエラーを返しています	api-server-error-ratio-10-percent.json
ベアメタル版 Anthos の API サーバーのエラー率が 5% を超えている（警告）	API サーバーがリクエストの 5% を超えるエラーを返しています	api-server-error-ratio-5-percent.json
ベアメタル版 Anthos の etcd のリーダーが頻繁に変更される（重大）	`etcd` リーダーが頻繁に変更される	etcd-leader-changes-too-frequent.json
ベアメタル版 Anthos の etcd のプロポーザルが頻繁に失敗する（重大）	`etcd` 件の提案が頻繁に失敗する	etcd-proposals-failed-too-frequent.json
ベアメタル版 Anthos の ectd サーバーがクォーラムに存在しない（重大）	`etcd` サーバーはクォーラムに存在しない	etcd-server-not-in-quorum.json
ベアメタル版 Anthos の etcd ストレージが 90 パーセントの上限を超えている（重大）	`etcd` のストレージ使用量が上限の 90% を超えている	etcd-storage-usage-high.json

通知の取得

アラートポリシーを作成すると、ポリシーに 1 つ以上の通知チャンネルを定義できます。通知チャンネルには、複数の種類があります。たとえば、メール、Slack チャンネル、モバイルアプリから通知を受け取れます。チャンネルは、ニーズに合わせて選択できます。

通知チャンネルを構成する方法については、通知チャンネルの管理をご覧ください。