OpenTelemetry 收集器使用入门

本文档介绍如何设置 OpenTelemetry 收集器以爬取标准 Prometheus 指标并将这些指标报告给 Google Cloud Managed Service for Prometheus。OpenTelemetry 收集器是一个可以自行部署，并将其配置为 Managed Service for Prometheus。具体设置与具有自行部署集合的 Managed Service for Prometheus。

您可能会出于以下原因选择 OpenTelemetry 收集器，而不是自部署收集器：

OpenTelemetry 收集器可让您将遥测数据路由到通过在流水线中配置不同的导出器来导入多个后端。
该收集器还支持来自指标、日志和跟踪记录的信号，因此，通过使用该收集器，您可以在一个代理中处理所有三种信号类型。
OpenTelemetry 与供应商无关的数据格式（OpenTelemetry 协议 [OTLP]）支持强大的库和可插入收集器组件生态系统。这样就提供一系列可自定义的选项，以便接收、处理和导出数据。

需要权衡这些优势，您需要运行 OpenTelemetry 收集器需要自行管理的部署和维护方法。选择哪种方法取决于您的特定需求，但在本文档中，我们提供了有关使用 Managed Service for Prometheus 作为后端配置 OpenTelemetry 收集器的推荐准则。

准备工作

本部分介绍本文档中描述的任务所需的配置。

设置项目和工具

要使用 Google Cloud Managed Service per Prometheus，您需要以下资源：

启用了 Cloud Monitoring API 的 Google Cloud 项目。
- 如果您没有 Google Cloud 项目，请执行以下操作：
  1. 在 Google Cloud 控制台中，转到新建项目：
    
    创建新项目
  2. 在项目名称字段中，为您的项目输入一个名称，然后点击创建。
  3. 转到结算：
    
    转到“结算”
  4. 在页面顶部选择您刚刚创建的项目（如果尚未选择）。
  5. 系统会提示您选择现有付款资料或创建新的付款资料。
  默认情况下，新项目会启用 Monitoring API。
- 如果您已有 Google Cloud 项目，请确保已启用 Monitoring API：
  1. 转到 API 和服务：
    
    转到 API 和服务
  2. 选择您的项目。
  3. 点击启用 API 和服务。
  4. 搜索“Monitoring”。
  5. 在搜索结果中，点击“Cloud Monitoring API”。
  6. 如果未显示“API 已启用”，请点击启用按钮。
Kubernetes 集群。如果您没有 Kubernetes 集群，请按照 GKE 快速入门中的说明进行操作。

您还需要以下命令行工具：

gcloud
kubectl

gcloud 和 kubectl 工具是 Google Cloud CLI 的一部分。如需了解如何安装这些工具，请参阅管理 Google Cloud CLI 组件。如需查看已安装的 gcloud CLI 组件，请运行以下命令：

gcloud components list

配置您的环境

为避免重复输入您的项目 ID 或集群名称，请执行以下配置：

按如下方式配置命令行工具：
- 配置 gcloud CLI 以引用您的 Google Cloud 项目的 ID：
```
gcloud config set project PROJECT_ID
```
- 配置 kubectl CLI 以使用集群：
```
kubectl config set-cluster CLUSTER_NAME
```
如需详细了解这些工具，请参阅以下内容：
- gcloud CLI 概览
- kubectl 命令

设置命名空间

为您在示例应用中创建的资源创建 NAMESPACE_NAME Kubernetes 命名空间：

kubectl create ns NAMESPACE_NAME

验证服务账号凭据

如果您的 Kubernetes 集群已启用 Workload Identity Federation for GKE，则可以跳过此部分。

在 GKE 上运行时，Managed Service for Prometheus 会自动根据 Compute Engine 默认服务账号从环境中检索凭据。默认情况下，默认服务账号具有必要的权限 monitoring.metricWriter 和 monitoring.viewer。如果您未使用 Workload Identity Federation for GKE，并且之前从默认节点服务账号中移除了任一角色，则必须重新添加这些缺少的权限，然后才能继续。

如果您不在 GKE 上运行，请参阅明确提供凭据。

为 Workload Identity Federation for GKE 配置服务账号

如果您的 Kubernetes 集群未启用 Workload Identity Federation for GKE，则可以跳过此部分。

Managed Service for Prometheus 使用 Cloud Monitoring API 捕获指标数据。如果您的集群使用的是 Workload Identity Federation for GKE，则必须向您的 Kubernetes 服务账号授予 Monitoring API 权限。本节介绍以下内容：

创建专用 Google Cloud 服务账号 gmp-test-sa。
将 Google Cloud 服务账号绑定到测试命名空间 NAMESPACE_NAME 中的默认 Kubernetes 服务账号。
为 Google Cloud 服务账号授予必要的权限。

创建和绑定服务账号

此步骤显示在 Managed Service for Prometheus 文档中的多个位置。如果您在执行先前的任务时已经执行此步骤，则无需重复执行。请直接跳到向服务账号授权。

以下命令序列会创建 gmp-test-sa 服务账号并将其绑定到 NAMESPACE_NAME 命名空间中的默认 Kubernetes 服务账号：

gcloud config set project PROJECT_ID \
&&
gcloud iam service-accounts create gmp-test-sa \
&&
gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE_NAME/default]" \
  gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
&&
kubectl annotate serviceaccount \
  --namespace NAMESPACE_NAME \
  default \
  iam.gke.io/gcp-service-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com

如果您使用的是其他 GKE 命名空间或服务账号，请适当调整命令。

向服务账号授权

相关权限组已收集到多个角色中，您可以将这些角色授予主账号（在此示例中为 Google Cloud 服务账号）。如需详细了解 Monitoring 角色，请参阅访问权限控制。

以下命令会向 Google Cloud 服务账号 gmp-test-sa 授予写入指标数据所需的 Monitoring API 角色。

如果您在执行先前的任务时已经为 Google Cloud 服务账号授予了特定角色，则无需再次执行此操作。

gcloud projects add-iam-policy-binding PROJECT_ID\
  --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
  --role=roles/monitoring.metricWriter

调试 Workload Identity Federation for GKE 配置

如果您在使 Workload Identity Federation for GKE 正常工作时遇到问题，请参阅验证 Workload Identity Federation for GKE 设置的文档和 Workload Identity Federation for GKE 故障排除指南。

由于拼写错误和部分复制粘贴是配置 Workload Identity Federation for GKE 时最常见的错误来源，因此我们强烈建议使用这些说明中代码示例中嵌入的可编辑变量和可点击复制粘贴图标。

生产环境中的 Workload Identity Federation for GKE

本文档中描述的示例将 Google Cloud 服务账号绑定到默认 Kubernetes 服务账号，并授予 Google Cloud 服务账号使用 Monitoring API 所需的所有权限。

在生产环境中，您可能需要使用更精细的方法，其中每个组件对应一个服务账号，并且每个服务账号都具有最小的权限。如需详细了解如何为工作负载身份管理配置服务账号，请参阅使用 Workload Identity Federation for GKE。

设置 OpenTelemetry 收集器

本部分将指导您设置和使用 OpenTelemetry 收集器从示例应用中爬取指标，并将数据发送到 Google Cloud Managed Service for Prometheus。如需了解详细的配置信息，请参阅以下部分：

爬取 Prometheus 指标
添加处理器
配置 googlemanagedprometheus 导出器

OpenTelemetry 收集器类似于 Managed Service for Prometheus 代理二进制文件。OpenTelemetry 社区会定期发布版本，包括源代码、二进制文件和容器映像。

您可以使用最佳实践默认值在虚拟机或 Kubernetes 集群上部署这些工件，也可以使用收集器构建器构建仅包含所需组件的收集器。如需构建可与 Managed Service for Prometheus 搭配使用的收集器，您需要以下组件：

Managed Service for Prometheus 导出器，这会将您的指标写入 Managed Service for Prometheus。
用于爬取指标的接收器。本文档假定您使用的是 OpenTelemetry Prometheus 接收器，但 Managed Service for Prometheus 导出器与任何 OpenTelemetry 指标接收器兼容。
处理器，用于批量处理和标记指标，以便根据您的环境添加重要的资源标识符。

这些组件是通过使用配置文件启用的，该配置文件会通过 --config 标志传递给 Collector。

以下各部分将更详细地讨论如何配置每个组件。本文档介绍如何在 GKE 和其他位置运行收集器。

配置和部署收集器

无论您是在 Google Cloud 上还是在其他环境中执行收集操作，都可以将 OpenTelemetry 收集器配置为导出到 Managed Service for Prometheus。最大的区别在于如何配置收集器。在非 Google Cloud 环境中，可能需要进行额外的指标数据格式设置，以便与 Managed Service for Prometheus 兼容。但在 Google Cloud 上收集器可以自动检测格式。

在 GKE 上运行 OpenTelemetry 收集器

您可以将以下配置复制到名为 config.yaml 的文件中，以在 GKE 上设置 OpenTelemetry Collector：

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'SCRAPE_JOB_NAME'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
          action: keep
          regex: prom-example
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

# Note that the googlemanagedprometheus exporter block is intentionally blank
exporters:
  googlemanagedprometheus:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch, memory_limiter, resourcedetection, transform]
      exporters: [googlemanagedprometheus]

上述配置使用 Prometheus 接收器和 Managed Service for Prometheus 导出器来爬取 Kubernetes Pod 上的指标端点，并将这些指标导出到 Managed Service for Prometheus。流水线处理器会对数据进行格式化和批量处理。

如需详细了解此配置的每个部分的功能以及不同平台的配置，请参阅下面有关爬取指标和添加处理器的详细部分。

将现有 Prometheus 配置与 OpenTelemetry 搭配使用时收集器的 prometheus 接收器，将任何 $ 字符替换为 $$ to avoid triggering environment variable substitution. For more information, see Scrape Prometheus metrics.

You can modify this config based on your environment, provider, and the metrics you want to scrape, but the example config is a recommended starting point for running on GKE.

Run the OpenTelemetry Collector outside Google Cloud

Running the OpenTelemetry Collector outside Google Cloud, such as on-premises or on other cloud providers, is similar to running the Collector on GKE. However, the metrics you scrape are less likely to automatically include data that best formats it for Managed Service for Prometheus. Therefore, you must take extra care to configure the collector to format the metrics so they are compatible with Managed Service for Prometheus.

You can the following config into a file called config.yaml to set up the OpenTelemetry Collector for deployment on a non-GKE Kubernetes cluster:

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'SCRAPE_JOB_NAME'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
          action: keep
          regex: prom-example
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

processors:
  resource:
    attributes:
    - key: "cluster"
      value: "CLUSTER_NAME"
      action: upsert
    - key: "namespace"
      value: "NAMESPACE_NAME"
      action: upsert
    - key: "location"
      value: "REGION"
      action: upsert

  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

exporters:
  googlemanagedprometheus:
    project: "PROJECT_ID"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch, memory_limiter, resource, transform]
      exporters: [googlemanagedprometheus]

This config does the following:

Sets up a Kubernetes service discovery scrape config for Prometheus. For more information, see scraping Prometheus metrics.
Manually sets cluster, namespace, and location resource attributes. For more information about resource attributes, including resource detection for Amazon EKS and Azure AKS, see Detect resource attributes.
Sets the project option in the googlemanagedprometheus exporter. For more information about the exporter, see Configure the googlemanagedprometheus exporter.

When using an existing Prometheus configuration with the OpenTelemetry Collector's prometheus receiver, replace any $ characters with $$，以避免来触发环境变量替换如需了解详情，请参阅爬取 Prometheus 指标。

如需了解在其他云上配置收集器的最佳实践，请参阅 Amazon EKS 或 Azure AKS。

部署示例应用

示例应用在其 metrics 端口上发出 example_requests_total 计数器指标和 example_random_numbers 直方图指标（以及其他指标）。本示例的清单定义了三个副本。

要部署示例应用，请运行以下命令：

kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.13.0/examples/example-app.yaml

将收集器配置创建为 ConfigMap

创建配置并将其放入名为 config.yaml 的文件中后，请使用该文件根据 config.yaml 文件创建 Kubernetes ConfigMap。部署收集器后，它会装载 ConfigMap 并加载文件。

如需使用您的配置创建名为 otel-config 的 ConfigMap，请使用以下代码命令：

kubectl -n NAMESPACE_NAME create configmap otel-config --from-file config.yaml

部署收集器

创建一个包含以下内容的 collector-deployment.yaml 文件：

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: NAMESPACE_NAME:prometheus-test
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: NAMESPACE_NAME:prometheus-test
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: NAMESPACE_NAME:prometheus-test
subjects:
- kind: ServiceAccount
  namespace: NAMESPACE_NAME
  name: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.106.0
        args:
        - --config
        - /etc/otel/config.yaml
        - --feature-gates=exporter.googlemanagedprometheus.intToDouble
        volumeMounts:
        - mountPath: /etc/otel/
          name: otel-config
      volumes:
      - name: otel-config
        configMap:
          name: otel-config

通过运行以下命令：

kubectl -n NAMESPACE_NAME create -f collector-deployment.yaml

pod 启动后，会爬取示例应用，并将指标报告给 Managed Service for Prometheus。

如需了解查询数据的方法，请参阅使用 Cloud Monitoring 进行查询或使用 Grafana 进行查询。

明确提供凭据

在 GKE 上运行时，OpenTelemetry 收集器会根据节点的服务账号自动从环境中检索凭据。在非 GKE Kubernetes 集群中，必须使用标志或 GOOGLE_APPLICATION_CREDENTIALS 环境变量将凭据明确提供给 OpenTelemetry Collector。

将上下文设置为目标项目：
```
gcloud config set project PROJECT_ID
```
创建服务账号：
```
gcloud iam service-accounts create gmp-test-sa
```
此步骤会创建您可能已在 Workload Identity Federation for GKE 说明中创建的服务账号。

向服务账号授予所需权限：

gcloud projects add-iam-policy-binding PROJECT_ID\
  --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
  --role=roles/monitoring.metricWriter

创建并下载服务账号的密钥：

gcloud iam service-accounts keys create gmp-test-sa-key.json \
  --iam-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com

将密钥文件作为 Secret 添加到非 GKE 集群：

kubectl -n NAMESPACE_NAME create secret generic gmp-test-sa \
  --from-file=key.json=gmp-test-sa-key.json

打开 OpenTelemetry Deployment 资源以进行修改：

kubectl -n NAMESPACE_NAME edit deployment otel-collector

将粗体显示的文本添加到资源：

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: NAMESPACE_NAME
  name: otel-collector
spec:
  template
    spec:
      containers:
      - name: otel-collector
        env:
        - name: "GOOGLE_APPLICATION_CREDENTIALS"
          value: "/gmp/key.json"
...
        volumeMounts:
        - name: gmp-sa
          mountPath: /gmp
          readOnly: true
...
      volumes:
      - name: gmp-sa
        secret:
          secretName: gmp-test-sa
...

保存该文件并关闭编辑器。应用更改后，系统会重新创建 pod 并使用给定服务账号向指标后端进行身份验证。

爬取 Prometheus 指标

本部分和后续部分提供了有关使用 OpenTelemetry Collector 的其他自定义信息。在某些情况下，此信息可能很有用，但不需要此信息也可以运行设置 OpenTelemetry 收集器中所述的示例。

如果您的应用已在公开 Prometheus 端点，则 OpenTelemetry 收集器可以使用与任何标准 Prometheus 配置搭配使用的爬取配置格式爬取这些端点。为此，请在收集器配置中启用 Prometheus 接收器。

Kubernetes pod 的简单 Prometheus 接收器配置可能如下所示：

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

service:
  pipelines:
    metrics:
      receivers: [prometheus]

这是一个简单的基于服务发现的爬取配置，您可以根据需要进行修改以爬取应用。

将现有的 Prometheus 配置与 OpenTelemetry 收集器的 prometheus 接收器搭配使用时，请将所有 $ 字符替换为 $$ to avoid triggering environment variable substitution. This is especially important to do for the replacement value within your relabel_configs section. For example, if you have the following relabel_config section:

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (.+):(?:\d+);(\d+)
  replacement: $1:$2
  target_label: __address__

Then rewrite it to be:

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (.+):(?:\d+);(\d+)
  replacement: $$1:$$2
  target_label: __address__


For more information, see the OpenTelemetry documentation.


Next, we strongly recommend that you use processors to format your metrics. In
many cases, processors must be used to properly format your metrics.

Add processors

OpenTelemetry
processors 
modify telemetry data before it is exported. You can use the processors below
to ensure that your metrics are written in a format compatible with
Managed Service for Prometheus.

Detect resource attributes

The Managed Service for Prometheus exporter for OpenTelemetry uses the
prometheus_target monitored
resource
to uniquely identify time series data points. The exporter parses the required
monitored-resource fields from resource attributes on the metric data points.
The fields and the attributes from which the values are scraped are:


project_id: auto-detected by Application Default
Credentials,
gcp.project.id, or project in exporter config (see configuring the
exporter)
location: location, cloud.availability_zone, cloud.region
cluster: cluster, k8s.cluster_name
namespace: namespace, k8s.namespace_name
job: service.name + service.namespace
instance: service.instance.id


Failure to set these labels to unique values can result in "duplicate
timeseries" errors when exporting to Managed Service for Prometheus.
Note: The terms labels and attributes, when referring to metric data points,
  represent essentially the same concept in Prometheus and OpenTelemetry,
  respectively. In this context, a Prometheus metric with the label foo will
  be converted into an OpenTelemetry data point with an attribute foo. The
  specific labels/attributes listed above are converted into resource
  attributes, which are another OpenTelemetry concept for identifying data
  points specific to the source of the data. These resource attributes are then
  mapped to the monitored resource fields listed.
The Prometheus receiver automatically sets the service.name attribute
based on the job_name in the scrape config, and service.instance.id
attribute based on the scrape target's instance. The receiver also sets
k8s.namespace.name when using role: pod in the scrape config.

We recommend populating the other attributes automatically using the resource
detection
processor.
However, depending on your environment, some attributes might not be automatically
detectable. In this case, you can use other processors to either manually
insert these values or parse them from metric labels. The following sections
illustration configurations for doing this processing on various platforms

GKE

When running OpenTelemetry on GKE, you only need to enable the
resource-detection processor to fill out the resource labels. Be sure that your
metrics don't already contain any of the reserved resource labels. If this is
unavoidable, see Avoid resource attribute collisions by renaming
attributes.

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s


This section can be copied directly into your config file, replacing the
processors section if it already exists.

Amazon EKS

The EKS resource detector does not automatically fill in the cluster or
namespace attributes. You can provide these values manually by using
the resource
processor,
as shown in the following example:

processors:
  resourcedetection:
    detectors: [eks]
    timeout: 10s

  resource:
    attributes:
    - key: "cluster"
      value: "my-eks-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert


You can also convert these values from metric labels using the groupbyattrs
processor (see move metric labels to resource labels below).

Azure AKS

The AKS resource detector does not automatically fill in the cluster or
namespace attributes. You can provide these values manually by using the
resource
processor,
as shown in the following example:

processors:
  resourcedetection:
    detectors: [aks]
    timeout: 10s

  resource:
    attributes:
    - key: "cluster"
      value: "my-eks-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert


You can also convert these values from metric labels by using the groupbyattrs
processor; see Move metric labels to resource labels.

On-premises and non-cloud environments

With on-premises or non-cloud environments, you probably can't
detect any of the necessary resource attributes automatically. In this case, you
can emit these labels in your metrics and move them to resource attributes (see
Move metric labels to resource labels), or manually set all
of the resource attributes as shown in the following example:

processors:
  resource:
    attributes:
    - key: "cluster"
      value: "my-on-prem-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert
    - key: "location"
      value: "us-east-1"
      action: upsert


Create your collector config as a ConfigMap describes how
to use the config. That section assumes you have put your config in a file
called config.yaml.

The project_id resource attribute can still be automatically set when running
the Collector with Application Default
Credentials.
If your Collector does not have access to Application Default Credentials, see
Setting project_id.

Alternatively, you can manually set the resource attributes you need in an
environment variable, OTEL_RESOURCE_ATTRIBUTES, with a comma-separated list of
key/value pairs, for example:

export OTEL_RESOURCE_ATTRIBUTES="cluster=my-cluster,namespace=my-app,location=us-east-1"


Then use the env resource detector
processor 
to set the resource attributes:

processors:
  resourcedetection:
    detectors: [env]


Avoid resource attribute collisions by renaming attributes

If your metrics already contain labels that collide with the required
resource attributes (such as location, cluster, or namespace), rename them
to avoid the collision. The Prometheus convention is to add the prefix exported_
to the label name. To add this prefix, use the transform
processor.

The following processors config renames any potential collisions and
resolves any conflicting keys from the metric:

processors:
  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")


Move metric labels to resource labels

In some cases, your metrics might be intentionally reporting labels such as
namespace because your exporter is monitoring multiple namespaces. For
example, when running the
kube-state-metrics 
exporter.

In this scenario, these labels can be moved to resource attributes using the
groupbyattrs
processor:

processors:
  groupbyattrs:
    keys:
    - namespace
    - cluster
    - location


In the above example, given a metric with the labels namespace, cluster,
and/or location, those labels will be converted to the matching resource
attributes.

Limit API requests and memory usage

Two other processors, the batch
processor 
and memory limiter
processor 
allow you to limit the resource consumption of your collector.

Batch processing

Batching requests lets you define how many data points to send in a single
request. Note that Cloud Monitoring has a
limit of 200 time series per
request. Enable the batch processor by using the following settings:

processors:
  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s


Memory limiting

We recommend enabling the memory-limiter processor to prevent your collector
from crashing at times of high throughput. Enable the processing by using
the following settings:

processors:
  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20


Configure the googlemanagedprometheus exporter

By default, using the googlemanagedprometheus exporter on GKE
requires no additional configuration. For many use cases you only need to enable
it with an empty block in the exporters section:

exporters:
  googlemanagedprometheus:


However, the exporter does provide some optional configuration settings. The
following sections describe the other configuration settings.

Setting project_id

To associate your time series with a Google Cloud project, the
prometheus_target monitored resource must have project_id set.

When running OpenTelemetry on Google Cloud, the
Managed Service for Prometheus exporter defaults to setting this value
based on the Application Default
Credentials
it finds. If no credentials are available, or you want to override the default
project, you have two options:


Set project in the exporter config
Add a gcp.project.id resource attribute to your metrics.


We strongly recommend using the default (unset) value for project_id rather
than explicitly setting it, when possible.
Note: When changing the project_id, the Collector's Service Account must have
  the roles/monitoring.metricWriter IAM role for the destination
  project.
Set project in the exporter config



The following config excerpt sends metrics to
Managed Service for Prometheus in the Google Cloud project MY_PROJECT:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

exporters:
  googlemanagedprometheus:
    project: MY_PROJECT

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection]
      exporters: [googlemanagedprometheus]


The only change from previous examples is the new line project: MY_PROJECT.
This setting is useful if you know that every metric coming through this
Collector should be sent to MY_PROJECT.

Set gcp.project.id resource attribute

You can set project association on a per-metric basis by adding a
gcp.project.id resource attribute to your metrics. Set the value of the
attribute to the name of the project the metric should be associated with.

For example, if your metric already has a label project, this label can be
moved to a resource attribute and renamed to gcp.project.id by using
processors in the Collector config, as shown in the following example:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  groupbyattrs:
    keys:
    - project

  resource:
    attributes:
    - key: "gcp.project.id"
      from_attribute: "project"
      action: upsert

exporters:
  googlemanagedprometheus:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection, groupbyattrs, resource]
      exporters: [googlemanagedprometheus]


Setting client options

The googlemanagedprometheus exporter uses gRPC clients for
Managed Service for Prometheus. Therefore, optional settings
are available for configuring the gRPC client:


compression: Enables gzip compression for gRPC requests, which is useful for
minimizing data transfer fees when sending data from other clouds to
Managed Service for Prometheus (valid values: gzip).
user_agent: Overrides the user-agent string sent on requests to
Cloud Monitoring; only applies to metrics.
Defaults to the build and version number of your OpenTelemetry Collector,
for example, opentelemetry-collector-contrib 0.106.0.
endpoint: Sets the endpoint to which metric data is going to be sent.
use_insecure: If true, uses gRPC as the communication transport. Has an
effect only when the endpoint value is not "".
grpc_pool_size: Sets the size of the connection pool in the gRPC client.
prefix: Configures the prefix of metrics sent to
Managed Service for Prometheus. Defaults to
prometheus.googleapis.com.
Don't change this prefix; doing so causes metrics to not be
queryable with PromQL in the Cloud Monitoring UI.


In most cases, you don't need to change these values from their
defaults. However, you can change them to accommodate special
circumstances.

All of these settings are set under a metric block in the
googlemanagedprometheus exporter section, as shown in the following example:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

exporters:
  googlemanagedprometheus:
    metric:
      compression: gzip
      user_agent: opentelemetry-collector-contrib 0.106.0
      endpoint: ""
      use_insecure: false
      grpc_pool_size: 1
      prefix: prometheus.googleapis.com

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection]
      exporters: [googlemanagedprometheus]


What's next




Use PromQL in Cloud Monitoring to query Prometheus metrics.
Use Grafana to query Prometheus metrics.
Set up the OpenTelemetry Collector as a sidecar agent in Cloud Run.















































































































































The Cloud Monitoring Metrics Management page provides information
that can help you control the amount you spend on billable metrics
without affecting observability. The Metrics Management page reports the
following information:

  Ingestion volumes for both byte- and sample-based billing, across metric
    domains and for individual metrics.
  Data about labels and cardinality of metrics.
  Number of reads for each metric.
  Use of metrics in alerting policies and custom dashboards.
  Rate of metric-write errors.

You can also use the Metrics Management to 
exclude unneeded metrics,
eliminating the cost of ingesting them.






For more information about the Metrics Management page, see
View and manage metric usage.

OpenTelemetry 收集器使用入门

准备工作

设置项目和工具

配置您的环境

设置命名空间

验证服务账号凭据

为 Workload Identity Federation for GKE 配置服务账号

创建和绑定服务账号

向服务账号授权

调试 Workload Identity Federation for GKE 配置

生产环境中的 Workload Identity Federation for GKE

设置 OpenTelemetry 收集器

配置和部署收集器

在 GKE 上运行 OpenTelemetry 收集器

Run the OpenTelemetry Collector outside Google Cloud

部署示例应用

将收集器配置创建为 ConfigMap

部署收集器

明确提供凭据

爬取 Prometheus 指标

Add processors

Detect resource attributes

GKE

Amazon EKS

Azure AKS

On-premises and non-cloud environments

Avoid resource attribute collisions by renaming attributes

Move metric labels to resource labels

Limit API requests and memory usage

Batch processing

Memory limiting

Configure the `googlemanagedprometheus` exporter

Setting `project_id`

Set `project` in the exporter config

Set `gcp.project.id` resource attribute

Setting client options

What's next