应用日志记录和监控功能

本页面介绍如何为 Anthos clusters on VMware (GKE On-Prem) 配置用户集群，以便将用户应用中的自定义日志和指标发送到 Cloud Logging 和 Cloud Monitoring。

从 1.12 开始，您可以使用 Google Cloud Managed Service for Prometheus 来监控工作负载，这是一个预览版功能。这是 Google Cloud 针对 Prometheus 指标的全代管式存储和查询服务。如需使用此功能，请按照以下步骤启用 Managed Service for Prometheus 和 Cloud Logging。

为用户应用启用 Managed Service for Prometheus（预览版）

Managed Service for Prometheus 的配置保存在名为 stackdriver 的 Stackdriver 对象中。

打开要修改的 stackdriver 对象：
```
kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG --namespace kube-system edit stackdriver stackdriver
```
将 USER_CLUSTER_KUBECONFIG 替换为用户集群 kubeconfig 文件的路径。

在 spec 下，将 enableGMPForApplications 设置为 true：

  apiVersion: addons.gke.io/v1alpha1
  kind: Stackdriver
  metadata:
    name: stackdriver
    namespace: kube-system
  spec:
    projectID: ...
    clusterName: ...
    clusterLocation: ...
    proxyConfigSecretName: ...
    enableGMPForApplications: true
    enableVPC: ...
    optimizedMetrics: true

关闭已修改的文件。这将开始在集群中运行 Google 管理的 Prometheus (GMP) 组件。

要检查组件，请运行以下命令：

kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG --namespace gmp-system get pods

此命令的输出类似以下内容：

 NAME                                 READY   STATUS    RESTARTS        AGE
 collector-abcde                      2/2     Running   1 (5d18h ago)   5d18h
 collector-fghij                      2/2     Running   1 (5d18h ago)   5d18h
 collector-klmno                      2/2     Running   1 (5d18h ago)   5d18h
 gmp-operator-68d49656fc-abcde        1/1     Running   0               5d18h
 rule-evaluator-7c686485fc-fghij      2/2     Running   1 (5d18h ago)   5d18h

Managed Service for Prometheus 支持规则评估和提醒。如需设置规则评估，请参阅规则评估。

运行示例应用

在本部分中，您将创建一个发出 Prometheus 指标的应用，并使用 Google 管理的 Prometheus 来收集指标。如需了解详情，请参阅 Google Cloud Managed Service for Prometheus。

部署示例应用

为您在示例应用中创建的资源创建 gmp-test 命名空间：
```
kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG create ns gmp-test
```

代管式服务为在其 metrics 端口上发出 Prometheus 指标的示例应用提供清单。该应用使用三个副本。

要部署示例应用，请运行以下命令：

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG -n gmp-test apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.4.1/examples/example-app.yaml

配置 PodMonitoring 资源

要注入示例应用发出的指标数据，请使用目标抓取。代管式服务使用 PodMonitoring 自定义资源 (CR) 来配置目标抓取和指标注入。您可以转换现有 prometheus-operator 资源为 PodMonitoring CR。

PodMonitoring CR 仅在部署了 CR 的命名空间中抓取目标。如需抓取多个命名空间中的目标，请在每个命名空间中部署同一 PodMonitoring CR。您可以通过运行以下命令来验证 PodMonitoring 资源已安装在预期的命名空间中：

 kubectl --kubeconfig USER_CLUSTER_KUBECONFIG get podmonitoring -A

如需了解所有 Managed Service for Prometheus CR 的参考文档，请参阅 prometheus-engine/doc/api 参考文档。

以下清单在 gmp-test 命名空间中定义了 PodMonitoring 资源 prom-example。该资源会查找命名空间中标签 app 值为 prom-example 的所有 Pod。在 /metrics HTTP 路径上，每 30 秒在名为 metrics 的端口上抓取匹配的 Pod。

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: prom-example
spec:
  selector:
    matchLabels:
      app: prom-example
  endpoints:
  - port: metrics
    interval: 30s

要应用此资源，请运行以下命令：

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG -n gmp-test apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.4.1/examples/pod-monitoring.yaml

Managed Service for Prometheus 现在正在爬取匹配的 Pod。

查询指标数据

如需验证正在导出 Prometheus 数据，最简单的方法是使用 Google Cloud 控制台中的 Metrics Explorer 中的 PromQL 查询。

如需运行 PromQL 查询，请执行以下操作：

在 Google Cloud 控制台中，进入 Monitoring 页面或点击以下按钮：

进入 Monitoring
在导航窗格中，选择 Metrics Explorer。
使用 Prometheus 查询语言 (PromQL) 指定要在图表上显示的数据：
1. 在选择指标窗格的工具栏中，选择代码编辑器。
2. 在语言切换菜单中选择 PromQL。语言切换开关位于代码编辑器窗格的底部。
3. 在查询编辑器中输入查询。例如，如需绘制过去一小时内 CPU 在每个模式下所花费的平均秒数，请使用以下查询：
```
avg(rate(kubernetes_io:anthos_container_cpu_usage_seconds_total
{monitored_resource="k8s_node"}[1h]))
```
如需详细了解如何使用 PromQL，请参阅 Cloud Monitoring 中的 PromQL。

以下屏幕截图中的图表显示了 anthos_container_cpu_usage_seconds_total 指标：

Prometheus `anthos_container_cpu_usage_seconds_total` 指标的 Managed Service for Prometheus 图表。

如果您收集了大量数据，则可能需要过滤导出的指标以降低费用。

为用户应用启用 Cloud Logging（预览版）

Logging 的配置保存在名为 stackdriver 的 Stackdriver 对象中。

打开要修改的 stackdriver 对象：
```
kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG --namespace kube-system edit stackdriver stackdriver
```
将 USER_CLUSTER_KUBECONFIG 替换为用户集群 kubeconfig 文件的路径。

在 spec 下，将 enableCloudLoggingForApplications 设置为 true：

  apiVersion: addons.gke.io/v1alpha1
  kind: Stackdriver
  metadata:
    name: stackdriver
    namespace: kube-system
  spec:
    projectID: ...
    clusterName: ...
    clusterLocation: ...
    proxyConfigSecretName: ...
    enableCloudLoggingForApplications: true
    enableVPC: ...
    optimizedMetrics: true

关闭已修改的文件。

运行示例应用

在本部分中，您将创建一个写入自定义日志的应用。

将以下 Deployment 清单保存到名为 my-app.yaml 的文件中：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "monitoring-example"
  namespace: "default"
  labels:
    app: "monitoring-example"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: "monitoring-example"
  template:
    metadata:
      labels:
        app: "monitoring-example"
    spec:
      containers:
      - image: gcr.io/google-samples/prometheus-dummy-exporter:latest
        name: prometheus-example-exporter
        imagePullPolicy: Always
        command:
        - /bin/sh
        - -c
        - ./prometheus-dummy-exporter --metric-name=example_monitoring_up --metric-value=1 --port=9090
        resources:
          requests:
            cpu: 100m

创建 Deployment：

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f my-app.yaml

查看应用日志

控制台

前往 Google Cloud 控制台中的日志浏览器。

前往日志浏览器
点击资源。在 ALL_RESOURCE_TYPES 下，选择 Kubernetes Container。
在 CLUSTER_NAME 下，选择用户集群的名称。
在 NAMESPACE_NAME 下，选择 default。
点击添加，然后点击运行查询。

在查询结果下，您可以查看来自 monitoring-example Deployment 的日志条目。例如：

{
  "textPayload": "2020/11/14 01:24:24 Starting to listen on :9090\n",
  "insertId": "1oa4vhg3qfxidt",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "pod_name": "monitoring-example-7685d96496-xqfsf",
      "cluster_name": ...,
      "namespace_name": "default",
      "project_id": ...,
      "location": "us-west1",
      "container_name": "prometheus-example-exporter"
    }
  },
  "timestamp": "2020-11-14T01:24:24.358600252Z",
  "labels": {
    "k8s-pod/pod-template-hash": "7685d96496",
    "k8s-pod/app": "monitoring-example"
  },
  "logName": "projects/.../logs/stdout",
  "receiveTimestamp": "2020-11-14T01:24:39.562864735Z"
}

gcloud

运行此命令：

gcloud logging read 'resource.labels.project_id="PROJECT_ID" AND \
    resource.type="k8s_container" AND resource.labels.namespace_name="default"'

将 PROJECT_ID 替换为日志记录监控项目的 ID。

在输出中，您可以看到来自 monitoring-example Deployment 的日志条目。例如：

insertId: 1oa4vhg3qfxidt
labels:
  k8s-pod/app: monitoring-example
  k8s- pod/pod-template-hash: 7685d96496
logName: projects/.../logs/stdout
receiveTimestamp: '2020-11-14T01:24:39.562864735Z'
resource:
  labels:
    cluster_name: ...
    container_name: prometheus-example-exporter
    location: us-west1
    namespace_name: default
    pod_name: monitoring-example-7685d96496-xqfsf
    project_id: ...
  type: k8s_container
textPayload: |
  2020/11/14 01:24:24 Starting to listen on :9090
timestamp: '2020-11-14T01:24:24.358600252Z'

为用户应用启用 Logging 和 Monitoring

本部分介绍不使用 Managed Service for Prometheus 时启用 Logging 和 Monitoring 的方法。Logging 和 Monitoring 的配置保存在名为 stackdriver 的 Stackdriver 对象中。

打开要修改的 stackdriver 对象：
```
kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG --namespace kube-system edit stackdriver stackdriver
```
将 USER_CLUSTER_KUBECONFIG 替换为用户集群 kubeconfig 文件的路径。

在 spec 下，将 enableStackdriverForApplications 设置为 true：

  apiVersion: addons.gke.io/v1alpha1
  kind: Stackdriver
  metadata:
    name: stackdriver
    namespace: kube-system
  spec:
    projectID: ...
    clusterName: ...
    clusterLocation: ...
    proxyConfigSecretName: ...
    enableStackdriverForApplications: true
    enableVPC: ...
    optimizedMetrics: true

关闭已修改的文件。

为工作负载添加注释

如需从应用收集自定义指标，请将 prometheus.io/scrape: "true" 注解添加到应用的 Service 或 Pod 清单中，或将相同的注解添加到 Deployment 或 DaemonSet 清单中的 spec.template 部分，以便传递给其 Pod。

为避免系统对指标执行垃圾回收，我们建议将指标抓取间隔设置为一分钟。

运行示例应用

在本部分中，您将创建一个应用，用于编写自定义日志并公开自定义指标。

将以下 Service 和 Deployment 清单保存到名为 my-app.yaml 的文件中。请注意，Service 具有注释 prometheus.io/scrape: "true"：

kind: Service
apiVersion: v1
metadata:
  name: "monitoring-example"
  namespace: "default"
  annotations:
    prometheus.io/scrape: "true"
spec:
  selector:
    app: "monitoring-example"
  ports:
    - name: http
      port: 9090
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "monitoring-example"
  namespace: "default"
  labels:
    app: "monitoring-example"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: "monitoring-example"
  template:
    metadata:
      labels:
        app: "monitoring-example"
    spec:
      containers:
      - image: gcr.io/google-samples/prometheus-dummy-exporter:latest
        name: prometheus-example-exporter
        imagePullPolicy: Always
        command:
        - /bin/sh
        - -c
        - ./prometheus-dummy-exporter --metric-name=example_monitoring_up --metric-value=1 --port=9090
        resources:
          requests:
            cpu: 100m

创建 Deployment 和 Service：

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f my-app.yaml

查看应用日志

控制台

前往 Google Cloud 控制台中的日志浏览器。

前往日志浏览器
点击资源。在 ALL_RESOURCE_TYPES 下，选择 Kubernetes Container。
在 CLUSTER_NAME 下，选择用户集群的名称。
在 NAMESPACE_NAME 下，选择 default。
点击添加，然后点击运行查询。

在查询结果下，您可以查看来自 monitoring-example Deployment 的日志条目。例如：

{
  "textPayload": "2020/11/14 01:24:24 Starting to listen on :9090\n",
  "insertId": "1oa4vhg3qfxidt",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "pod_name": "monitoring-example-7685d96496-xqfsf",
      "cluster_name": ...,
      "namespace_name": "default",
      "project_id": ...,
      "location": "us-west1",
      "container_name": "prometheus-example-exporter"
    }
  },
  "timestamp": "2020-11-14T01:24:24.358600252Z",
  "labels": {
    "k8s-pod/pod-template-hash": "7685d96496",
    "k8s-pod/app": "monitoring-example"
  },
  "logName": "projects/.../logs/stdout",
  "receiveTimestamp": "2020-11-14T01:24:39.562864735Z"
}

gcloud

运行此命令：

gcloud logging read 'resource.labels.project_id="PROJECT_ID" AND \
    resource.type="k8s_container" AND resource.labels.namespace_name="default"'

将 PROJECT_ID 替换为日志记录监控项目的 ID。

在输出中，您可以看到来自 monitoring-example Deployment 的日志条目。例如：

insertId: 1oa4vhg3qfxidt
labels:
  k8s-pod/app: monitoring-example
  k8s- pod/pod-template-hash: 7685d96496
logName: projects/.../logs/stdout
receiveTimestamp: '2020-11-14T01:24:39.562864735Z'
resource:
  labels:
    cluster_name: ...
    container_name: prometheus-example-exporter
    location: us-west1
    namespace_name: default
    pod_name: monitoring-example-7685d96496-xqfsf
    project_id: ...
  type: k8s_container
textPayload: |
  2020/11/14 01:24:24 Starting to listen on :9090
timestamp: '2020-11-14T01:24:24.358600252Z'

在 Google Cloud Console 中查看应用指标

您的示例应用将公开一个名为 example_monitoring_up 的自定义指标。您可以在 Google Cloud 控制台中查看该指标的值。

前往 Google Cloud 控制台中的 Metrics Explorer。

打开 Metrics Explorer
对于资源类型，请选择 Kubernetes Pod 或 Kubernetes Container。
对于指标，请选择 external.googleapis.com/prometheus/example_monitoring_up。
在图表中，您可以看到 example_monitoring_up 的重复值为 1。