此页面由 Cloud Translation API 翻译。

在 Google Kubernetes Engine 上部署 OpenTelemetry 收集器

本文档介绍如何在 GKE 集群中运行 OpenTelemetry 收集器，以从插桩的应用中收集 OTLP 日志、指标和跟踪记录，并将这些数据导出到 Google Cloud。

准备工作

在 GKE 上运行 OpenTelemetry 收集器需要以下资源：

启用了 Cloud Monitoring API、Cloud Trace API 和 Cloud Logging API 的 Google Cloud 项目。
- 如果您没有 Google Cloud 项目，请执行以下操作：
  1. 在 Google Cloud 控制台中，转到新建项目：
    
    创建新项目
  2. 在项目名称字段中，为您的项目输入一个名称，然后点击创建。
  3. 转到结算：
    
    转到“结算”
  4. 在页面顶部选择您刚刚创建的项目（如果尚未选择）。
  5. 系统会提示您选择现有付款资料或创建新的付款资料。
  默认情况下，系统会为新项目启用 Monitoring API、Trace API 和 Logging API。
- 如果您已有 Google Cloud 项目，请确保已启用 Monitoring API、Trace API 和 Logging API：
  1. 转到 API 和服务：
    
    转到 API 和服务
  2. 选择您的项目。
  3. 点击 启用 API 和服务。
  4. 按名称搜索每个 API。
  5. 在搜索结果中，点击已命名的 API。Monitoring API 显示为“Stackdriver Monitoring API”。
  6. 如果未显示“API 已启用”，请点击启用按钮。
Kubernetes 集群。如果您没有 Kubernetes 集群，请按照 GKE 快速入门中的说明进行操作。
以下命令行工具：
- gcloud
- kubectl
gcloud 和 kubectl 工具是 Google Cloud CLI 的一部分。如需了解如何安装这些工具，请参阅管理 Google Cloud CLI 组件。如需查看已安装的 gcloud CLI 组件，请运行以下命令：
```
gcloud components list
```

部署收集器

在将 PROJECT_ID 替换为您的 Google Cloud 项目的 ID 后，可以使用以下命令直接从 GitHub 部署收集器流水线：

export GCLOUD_PROJECT=PROJECT_ID
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest.git/k8s/base | envsubst | kubectl apply -f -

观察和调试收集器

OpenTelemetry 收集器提供现成的自我可观测性指标，可帮助您监控其性能并确保 OTLP 注入流水线的持续正常运行时间。

如需监控收集器，请安装收集器的示例信息中心。此信息中心可让您一目了然地了解来自收集器的多个指标，包括正常运行时间、内存用量以及对 Google Cloud Observability 的 API 调用。

如需安装信息中心，请执行以下操作：

在 Google Cloud 控制台中，转到 信息中心页面：
前往信息中心

如果您使用搜索栏查找此页面，请选择子标题为监控的结果。
选择示例库标签页。
选择 OpenTelemetry Collector（OpenTelemetry 收集器）类别。
选择“OpenTelemtry Collector（OpenTelemetry 收集器）”信息中心。
点击导入。

如需详细了解安装过程，请参阅安装示例信息中心。

配置收集器

自行管理的 OTLP 注入流水线包含默认 OpenTelemetry 收集器配置，该配置旨在传递大量 OTLP 指标、日志和跟踪记录，并附加一致的 GKE 和 Kubernetes 元数据。它还可用于防止常见的注入问题。

但是，您可能有独特的需求，需要自定义默认配置。本部分介绍了流水线随附的默认值，以及如何自定义这些默认值来满足您的需求。

默认收集器配置以 config/collector.yaml 的形式位于 GitHub 上：

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exporters:
  googlecloud:
    log:
      default_log_name: opentelemetry-collector
    user_agent: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.106.0
  googlemanagedprometheus:
    user_agent: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.106.0

extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  filter/self-metrics:
    metrics:
      include:
        match_type: strict
        metric_names:
        - otelcol_process_uptime
        - otelcol_process_memory_rss
        - otelcol_grpc_io_client_completed_rpcs
        - otelcol_googlecloudmonitoring_point_count
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  metricstransform/self-metrics:
    transforms:
    - action: update
      include: otelcol_process_uptime
      operations:
      - action: add_label
        new_label: version
        new_value: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.106.0

  # We need to add the pod IP as a resource label so the k8s attributes processor can find it.
  resource/self-metrics:
    attributes:
    - action: insert
      key: k8s.pod.ip
      value: ${env:MY_POD_IP}

  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        cors:
          allowed_origins:
          - http://*
          - https://*
        endpoint: ${env:MY_POD_IP}:4318
  prometheus/self-metrics:
    config:
      scrape_configs:
      - job_name: otel-self-metrics
        scrape_interval: 1m
        static_configs:
        - targets:
          - ${env:MY_POD_IP}:8888

service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - resourcedetection
      - memory_limiter
      - batch
      receivers:
      - otlp
    metrics/otlp:
      exporters:
      - googlemanagedprometheus
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - transform/collision
      - batch
      receivers:
      - otlp
    metrics/self-metrics:
      exporters:
      - googlemanagedprometheus
      processors:
      - filter/self-metrics
      - metricstransform/self-metrics
      - resource/self-metrics
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - prometheus/self-metrics
    traces:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - otlp
  telemetry:
    metrics:
      address: ${env:MY_POD_IP}:8888

出口商

默认导出器包括 googlecloud 导出器（用于日志和跟踪记录）和 googlemanagedprometheus 导出器（用于指标）。

googlecloud 导出器配置了默认日志名称。googlemanagedprometheus 导出器不需要任何默认配置；如需详情了解如何配置此导出器，请参阅 Google Cloud Managed Service for Prometheus 文档中的 OpenTelemetry 收集器使用入门。

处理器

默认配置包括以下处理器：

batch：配置为在达到每个请求的 Google Cloud 条目数量上限时，或按照每 5 秒的 Google Cloud 最小时间间隔（以先发生者为准）来批处理遥测请求。
k8sattributes：自动将 Kubernetes 资源属性映射到遥测标签。
memory_limiter：将收集器的内存用量限制在合理的级别，以便通过丢弃超出此级别的数据点来防止内存不足崩溃。
resourcedetection：自动检测 Google Cloud 资源标签，例如集群名称和项目 ID。
transform：重命名与 Google Cloud 受监控的资源字段发生冲突的指标标签。

接收器

默认配置仅包括 otlp 接收器。如需详细了解如何对应用进行插桩处理以将 OTLP 跟踪记录和指标推送到收集器的 OTLP 端点，请参阅选择插桩方法。

后续步骤：收集和查看遥测数据

本部分介绍如何部署一个示例应用并将该应用指向收集器的 OTLP 端点，以及在 Google Cloud 中查看遥测数据。示例应用是一个小型生成器，用于将跟踪记录、日志和指标导出到收集器。

如果您已有使用 OpenTelemetry SDK 进行插桩处理的应用，则可以改为将相应应用指向收集器的端点。

如需部署示例应用，请运行以下命令：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/main/sample/app.yaml

如需将使用 OpenTelemetry SDK 的现有应用指向收集器的端点，请将 OTEL_EXPORTER_OTLP_ENDPOINT 环境变量设置为 http://opentelemetry-collector.opentelemetry.svc.cluster.local:4317。

几分钟后，应用生成的遥测数据会开始通过收集器流向 Google Cloud 控制台以用于每个信号。