使用诊断收集器

诊断收集器是一种工具，可按需捕获有关 Apigee Hybrid 实例的 Kubernetes 组件的诊断数据，并将这些数据存储在 Google Cloud 存储桶中。您可以使用 apigeectl diagnostic 命令调用诊断收集器。

捕获哪些系统数据？

诊断收集器会捕获以下类型的数据：

更改日志级别。
Jstack。
POD 配置 yaml。
PS -ef 输出。
TCP 转储。
TOP 输出。

数据会发生什么变化？

当诊断收集器捕获数据时，会上传到 Google Cloud 项目中的存储桶。您可以在 Google Cloud Platform：Cloud Storage 浏览器中查看存储的数据。

您可以在创建支持服务工单时选择与 Google Apigee Support 共享此数据。

运行诊断收集器的前提条件

在使用诊断收集器之前，必须满足以下前提条件：

Google Cloud Storage 存储桶

在您的 Google Cloud 项目中创建一个具有唯一名称的 Google Cloud Storage 存储桶。您可以使用 gcloud storage 命令或在 Google Cloud Platform：Cloud Storage 浏览器中创建和管理存储桶。

例如：

gcloud storage buckets create gs://apigee_diagnostic_data

Creating gs://apigee_diagnostic_data/...

如需了解相关说明，请参阅创建存储桶。

服务账号

在您的项目中创建一个具有 Storage Admin 角色 (roles/storage.admin) 的服务账号，然后下载服务账号 .json 密钥文件。

服务账号可以使用任何唯一名称。本指南使用“apigee-diagnostic”作为服务账号名称。

例如：

gcloud config set project ${PROJECT_ID}

gcloud iam service-accounts create apigee-diagnostic

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:apigee-diagnostic@${PROJECT_ID}.iam.gserviceaccount.com" \
    --role="roles/storage.admin"

gcloud iam service-accounts keys create ${PROJECT_ID}-apigee-diagnostic.json \
    --iam-account=apigee-diagnostic@${PROJECT_ID}.iam.gserviceaccount.com

请参阅：

使用诊断收集器

诊断收集器的使用序列如下所示：

在 overrides.yaml 文件中配置诊断节，以选择信息类型、Apigee 容器以及要从中获取诊断数据的各个 pod。请参阅为诊断收集器配置 overrides.yaml。
使用以下 apigeectl 命令运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```
其中 OVERRIDES_FILE 是 overrides.yaml 文件的路径。
查看日志：
1. 获取 apigee-diagnostic 命名空间中的 Pod。
```
kubectl get pods -n apigee-diagnostic
```
2. 记下名称包含 diagnostic-collector 的 pod
3. 使用以下命令检查日志：
```
kubectl -n apigee-diagnostic logs -f POD_NAME
```
  其中，POD_NAME 是诊断收集器 pod 的名称。
  
  您还可以在 Google Cloud Platform：Cloud Storage 浏览器中查看收集的日志。
收集数据后，删除诊断收集器。除非删除，否则您无法再次运行。
```
apigeectl diagnostic delete -f OVERRIDES_FILE
```

为诊断收集器配置 `overrides.yaml`

在运行诊断收集器之前，您需要在 overrides.yaml 文件中对其进行配置。

如需查看 diagnostic 配置属性的完整参考文档，请参阅配置属性参考文档：diagnostic。

必需属性

诊断收集器需要以下属性才能运行。

diagnostic.serviceAccountPath：前提条件中具有 Storage Admin 角色的服务账号的服务账号密钥文件的路径。
diagnostic.operation：指定是收集所有统计信息还是仅收集日志。
值为 "ALL" 或 "LOGGING"。

如果将 diagnostic.operation 设置为 "LOGGING"，则需要以下属性：
diagnostic.bucket：将存储您的诊断数据的 Google Cloud 存储桶的名称。这是您在前提条件中创建的存储桶。

diagnostic.container：用于指定从中捕获数据的 pod 类型。这些值可以是以下值之一：

`container` 值	Apigee 组件	Kubernetes 命名空间	此容器中的示例 Pod 名称
`apigee-cassandra`	Cassandra	`apigee`	`apigee-cassandra-default-0`
`istio-proxy`	Istio 入站流量	`istio-system`	`istio-ingressgateway-696879cdf8-9zzzf`
`apigee-mart-server`	MART	`apigee`	`apigee-mart-hybrid-example-d89fed1-151-jj2ux-l7nlb`
`apigee-runtime`	消息处理器	`apigee`	`apigee-runtime-hybrid-example-3b2ebf3-151-s64bh-g9qmv`
`apigee-synchronizer`	同步器	`apigee`	`apigee-synchronizer-hybrid-example-3b2ebf3-151-xx4z6cg78`
`apigee-udca`	UDCA	`apigee`	`apigee-udca-hybrid-example-3b2ebf3-151-q4g2c-vnzg9`
`apigee-watcher`	Watcher	`apigee`	`apigee-watcher-hybrid-example-d89fed1-151-cpu3s-sxxdf`

diagnostic.namespace：您要从中收集数据的 pod 所在的 Kubernetes 命名空间。命名空间必须是您使用 diagnostic.container 指定的容器的正确命名空间。

diagnostic.podNames：您要收集诊断数据的各个 pod 的名称。例如：

diagnostic:
 …
 podNames:
 - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-2wcjn
 - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-6xzn2

仅在操作设置为 `LOGGING` 时才需要的属性

只有在 diagnostic.operation 是 LOGGING 时运行诊断收集器才需要以下属性。

diagnostic.loggerNames：按名称指定从哪个日志记录器收集数据。对于 Apigee Hybrid 1.6.0 版，唯一支持的值是 ALL，即所有记录器。例如：
```
diagnostic:
 …
 loggingDetails:
   loggerNames:
   - ALL
```
diagnostic.logLevel：指定要收集的日志记录数据的粒度。在 Apigee Hybrid 1.6 中，仅支持 FINE。
diagnostic.logDuration：收集的日志数据的时长（以毫秒为单位）。典型值为 30000。

可选属性

以下属性是可选的。

diagnostic.tcpDumpDetails.maxMsgs：设置要收集的 tcpDump 消息数上限。Apigee 建议的最大值不超过 1000。
diagnostic.tcpDumpDetails.timeoutInSeconds：设置等待 tcpDump 返回消息的时间（以秒为单位）。
diagnostic.threadDumpDetails.delayInSeconds：收集每个线程转储之间的延迟时间（以秒为单位）。必须与 diagnostic.threadDumpDetails.iterations 结合使用。
diagnostic.threadDumpDetails.iterations：要收集的 jstack 线程转储迭代的次数。必须与 diagnostic.threadDumpDetails.delayInSeconds 结合使用。

一般示例

以下是显示了所有可能条目的 diagnostic 节示例，：

diagnostic:
  # required properties:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "ALL"
  bucket: "diagnostics_data"
  container: "apigee-runtime"
  namespace: "apigee"
  podNames:
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-2wcjn
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-6xzn2

  # required if operation is Logging
  loggingDetails:
    loggerNames:
    - ALL
    logLevel: FINE
    logDuration: 30000

  # optional properties:
  tcpDumpDetails:
    maxMsgs: 10
    timeoutInSeconds: 100

  threadDumpDetails:
    iterations: 5
    delayInSeconds: 2

常见使用场景

以下示例展示了如何在某些常见情况下配置和使用诊断收集器。

高代理延迟时间

在这种情况下，Apigee-runtime 需要很长时间来处理请求，客户因此会看到高代理延迟时间。您需要收集 Jstack 和 TOP 输出。

选择任何 2 个运行时 Pod。

使用以下结构创建 diagnostic 节：

diagnostic:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "ALL"
  bucket: "diagnostics_data"
  container: "apigee-runtime"
  namespace: "apigee"
  podNames:
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-2wcjn
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-6xzn2

  tcpDumpDetails:
    maxMsgs: 10

  threadDumpDetails:
    iterations: 15
    delayInSeconds: 1

配置 diagnostic 节后，运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```

收集日志并删除诊断收集器。

apigeectl diagnostic delete -f OVERRIDES_FILE

网络/连接问题

您需要在 apigee-runtime 以及入站流量网关 pod 上运行诊断。

选择任何 2 个运行时 Pod。

使用以下结构创建 diagnostic 节：

diagnostic:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "ALL"
  bucket: "diagnostics_data"
  container: "apigee-runtime"
  namespace: "apigee"
  podNames:
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-2wcjn
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-6xzn2

  tcpDumpDetails:
    maxMsgs: 1000

配置 diagnostic 节后，运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```

收集日志并删除诊断收集器。

apigeectl diagnostic delete -f OVERRIDES_FILE

从 Istio 入站流量网关中选择两个 pod。

使用 Istio Ingress pod 重新配置 diagnostic 节：

diagnostic:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "ALL"
  bucket: "diagnostics_data"
  container: "istio-proxy"
  namespace: "istio-system"
  podNames:
  - istio-ingressgateway-696879cdf8-9zzzf
  - istio-ingressgateway-696879cdf8-6abc7

  tcpDumpDetails:
    maxMsgs: 1000

配置 diagnostic 节后，运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```

收集日志并删除诊断收集器。

apigeectl diagnostic delete -f OVERRIDES_FILE

代理抛出意外错误或者未应用新合同

在这种情况下，您需要更改日志级别，以进行至少 5 分钟，甚至 10 分钟的调试（如本示例所示）。这将增加日志量，但系统会记录有用的信息。您将运行两次诊断收集器，一次在 Apigee 运行时上运行，然后再在 Apigee 同步器上运行。

选择任何 2 个运行时 Pod。

使用以下结构创建 diagnostic 节：

diagnostic:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "LOGGING"
  bucket: "diagnostics_data"
  namespace: "apigee"
  container: "apigee-runtime"
  podNames:
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-2wcjn
  - apigee-runtime-hybrid-example-3b2ebf3-150-8vfoj-6xzn2

  loggingDetails:
    loggerNames:
    - ALL
    logLevel: FINE
    logDuration: 60000

配置 diagnostic 节后，运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```

收集日志并删除诊断收集器。

apigeectl diagnostic delete -f OVERRIDES_FILE

选择任何 2 个同步器 pod。

使用以下结构创建 diagnostic 节：

diagnostic:
  serviceAccountPath: "service-accounts/apigee-diagnostics.json"
  operation: "LOGGING"
  bucket: "diagnostics_data"
  namespace: "apigee"
  container: "apigee-synchronizer"
  podNames:
  - apigee-synchronizer-hybrid-example-3b2ebf3-150-xx4z-6cg78
  - apigee-synchronizer-hybrid-example-3b2ebf3-150-xx4z-1a2b3

  loggingDetails:
    loggerNames:
    - ALL
    logLevel: FINE
    logDuration: 60000

配置 diagnostic 节后，运行诊断收集器。
```
apigeectl diagnostic -f OVERRIDES_FILE
```

收集日志并删除诊断收集器。

apigeectl diagnostic delete -f OVERRIDES_FILE

使用诊断收集器

捕获哪些系统数据？

数据会发生什么变化？

运行诊断收集器的前提条件

Google Cloud Storage 存储桶

服务账号

使用诊断收集器

为诊断收集器配置 overrides.yaml

必需属性

仅在操作设置为 LOGGING 时才需要的属性

可选属性

一般示例

常见使用场景

高代理延迟时间

网络/连接问题

代理抛出意外错误或者未应用新合同

为诊断收集器配置 `overrides.yaml`

仅在操作设置为 `LOGGING` 时才需要的属性