1.7 版。此版本已不再受支持。如需了解详情，请参阅版本支持政策。如需了解如何升级到 1.8 版，请参阅 1.8 版文档中的升级 Anthos on Bare Metal。

可用的支持版本：1.14 | 1.13 | 1.12

创建提醒政策

本页面介绍了如何为 Anthos clusters on Bare Metal集群创建提醒政策。

准备工作

您必须拥有以下权限才能创建提醒政策：

monitoring.alertPolicies.create
monitoring.alertPolicies.delete
monitoring.alertPolicies.update

如果您具有以下任何角色，则表示具备这些权限：

monitoring.alertPolicyEditor
monitoring.editor
Project Editor
项目所有者

如需查看您的角色，请转到 Google Cloud Console 中的 IAM 页面。

创建政策：集群 API 服务器关闭

在本练习中，您将为集群的 Kubernetes API 服务器创建提醒政策。实施此政策后，您可以安排在集群的 API 服务器发生故障时接收通知。

下载政策配置文件：apiserver-down.json。
创建政策：
```
gcloud alpha monitoring policies create --policy-from-file=POLICY_CONFIG
```
将 POLICY_CONFIG 替换为您刚刚下载的配置文件的路径。

查看您的提醒政策：

控制台

在 Google Cloud Console 中，转到 Monitoring 页面。

前往 Monitoring
选择左侧的提醒。
在政策下方，您可以看到提醒政策的列表。

在列表中，选择 Anthos on baremetal API 服务器关闭（严重）以查看新政策的详细信息。在条件下，您可以查看政策的说明。例如：
```
Policy violates when ANY condition is met
Anthos on baremetal API server is up
```

gcloud

gcloud alpha monitoring policies list

输出会显示有关政策的详细信息。例如：

---
combiner: OR
conditions:
- conditionMonitoringQueryLanguage:
    duration: 0s
    query: |-
      { t_0:
          fetch k8s_container
          | metric 'kubernetes.io/anthos/up'
          | filter (resource.container_name =~ 'kube-apiserver')
          | align mean_aligner()
          | group_by 1m, [value_up_mean: mean(value.up)]
          | every 1m
          | group_by [resource.project_id, resource.location, resource.cluster_name],
              [value_up_mean_aggregate: aggregate(value_up_mean)]
      ; t_1:
          fetch k8s_container::kubernetes.io/anthos/anthos_cluster_info
          | filter (metric.anthos_distribution = 'baremetal')
          | align mean_aligner()
          | group_by [resource.project_id, resource.location, resource.cluster_name],
              [value_anthos_cluster_info_aggregate:
                 aggregate(value.anthos_cluster_info)]
          | every 1m }
      | join
      | value [t_0.value_up_mean_aggregate]
      | window 1m
      | absent_for 300s
    trigger:
      count: 1
  displayName: Anthos on baremetal API server is up
  name: projects/xxxxxx/alertPolicies/8497323605386949154/conditions/8497323605386950375
creationRecord:
  mutateTime: '2021-03-17T23:07:18.618778106Z'
  mutatedBy: sharon@example.com
displayName: Anthos on baremetal API server down (critical)
enabled: true
mutationRecord:
  mutateTime: '2021-03-17T23:07:18.618778106Z'
  mutatedBy: sharon@example.com
name: projects/xxxxxx/alertPolicies/8497323605386949154

创建其他提醒政策

本部分针对一组建议的提醒政策提供了说明和配置文件。

要创建政策，请按照您在之前的练习中所用的步骤操作：

点击右列中的链接以下载配置文件。
运行 gcloud alpha monitoring policies create 以创建政策。

控制平面组件可用性

提醒名称	说明	Cloud Monitoring 中的提醒政策定义
Anthos on baremetal API 服务器关闭（严重）	API 服务器已从指标目标发现中消失	apiserver-down.json
Anthos on baremetal 调度器关闭（严重）	调度器已从指标目标发现中消失	scheduler-down.json
Anthos on baremetal 控制器管理器关闭（严重）	控制器管理器已从指标目标发现中消失	controller-manager-down.json

Kubernetes 系统

提醒名称	说明	Cloud Monitoring 中的提醒政策定义
Anthos on baremetal pod 发生崩溃循环（严重）	Pod 处于崩溃循环状态	pod-crash-looping.json
Anthos on baremetal pod 处于尚未就绪状态超过一小时（严重）	Pod 处于尚未就绪状态超过一小时	pod-not-ready-1h.json
Anthos on baremetal 永久性卷用量高（严重）	已声明的永久性卷量预计会填满	persistent-volume-usage-high.json
Anthos on baremetal 节点处于尚未就绪状态超过一小时（严重）	节点处于尚未就绪状态超过一小时	node-not-ready-1h.json
Anthos on baremetal 节点 CPU 的使用率超过 80%（严重）	节点 CPU 使用率超过 80%	node-cpu-usage-high.json
Anthos on baremetal 节点内存用量超过 80%（严重）	节点内存用量超过 80%	node-memory-usage-high.json
Anthos on baremetal 节点磁盘用量超过 80%（严重）	节点磁盘用量超过 80%	node-disk-usage-high.json

Kubernetes 性能

提醒名称	说明	Cloud Monitoring 中的提醒政策定义
Anthos on baremetal API 服务器错误计数率超过 10%（严重）	API 服务器为超过 10% 的请求返回错误	api-server-error-ratio-10-percent.json
Anthos on baremetal API 服务器错误计数率超过 5%（警告）	API 服务器为超过 5% 的请求返回错误	api-server-error-ratio-5-percent.json
Anthos on baremetal etcd leader 更改过于频繁（严重）	etcd leader 更改过于频繁	etcd-leader-changes-too-frequent.json
Anthos on baremetal etcd 提案失败过于频繁（严重）	etcd 提案失败过于频繁	etcd-proposals-failed-too-frequent.json
Anthos on baremetal etcd 服务器未达到法定人数（严重）	etcd 服务器未达到法定人数	etcd-server-not-in-quorum.json

接收通知

创建提醒政策后，您可以为该政策定义一个或多个通知渠道。通知渠道有多种类型。例如，您可能会收到电子邮件、Slack 频道或移动应用发出的通知。您可以选择符合您需求的渠道。

如需了解如何配置通知渠道，请参阅管理通知渠道。