本文档介绍如何设置 OpenTelemetry 收集器以爬取标准 Prometheus 指标并将这些指标报告给 Google Cloud Managed Service for Prometheus。OpenTelemetry 收集器是一个 可以自行部署,并将其配置为 Managed Service for Prometheus。具体设置与 具有自行部署集合的 Managed Service for Prometheus。
您可能会出于以下原因选择 OpenTelemetry 收集器,而不是自部署收集器:
- OpenTelemetry 收集器可让您将遥测数据路由到 通过在流水线中配置不同的导出器来导入多个后端。
- 该收集器还支持来自指标、日志和跟踪记录的信号,因此,通过使用该收集器,您可以在一个代理中处理所有三种信号类型。
- OpenTelemetry 与供应商无关的数据格式(OpenTelemetry 协议 [OTLP])支持强大的库和可插入收集器组件生态系统。这样就提供一系列可自定义的选项,以便接收、 处理和导出数据。
需要权衡这些优势,您需要运行 OpenTelemetry 收集器 需要自行管理的部署和维护方法。选择哪种方法取决于您的特定需求,但在本文档中,我们提供了有关使用 Managed Service for Prometheus 作为后端配置 OpenTelemetry 收集器的推荐准则。
准备工作
本部分介绍本文档中描述的任务所需的配置。
设置项目和工具
要使用 Google Cloud Managed Service per Prometheus,您需要以下资源:
启用了 Cloud Monitoring API 的 Google Cloud 项目。
如果您没有 Google Cloud 项目,请执行以下操作:
在 Google Cloud 控制台中,转到新建项目:
在项目名称字段中,为您的项目输入一个名称,然后点击创建。
转到结算:
在页面顶部选择您刚刚创建的项目(如果尚未选择)。
系统会提示您选择现有付款资料或创建新的付款资料。
默认情况下,新项目会启用 Monitoring API。
如果您已有 Google Cloud 项目,请确保已启用 Monitoring API:
转到 API 和服务:
选择您的项目。
点击启用 API 和服务。
搜索“Monitoring”。
在搜索结果中,点击“Cloud Monitoring API”。
如果未显示“API 已启用”,请点击启用按钮。
Kubernetes 集群。如果您没有 Kubernetes 集群,请按照 GKE 快速入门中的说明进行操作。
您还需要以下命令行工具:
gcloud
kubectl
gcloud
和 kubectl
工具是 Google Cloud CLI 的一部分。如需了解如何安装这些工具,请参阅管理 Google Cloud CLI 组件。如需查看已安装的 gcloud CLI 组件,请运行以下命令:
gcloud components list
配置您的环境
为避免重复输入您的项目 ID 或集群名称,请执行以下配置:
按如下方式配置命令行工具:
配置 gcloud CLI 以引用您的 Google Cloud 项目的 ID:
gcloud config set project PROJECT_ID
配置
kubectl
CLI 以使用集群:kubectl config set-cluster CLUSTER_NAME
如需详细了解这些工具,请参阅以下内容:
设置命名空间
为您在示例应用中创建的资源创建 NAMESPACE_NAME
Kubernetes 命名空间:
kubectl create ns NAMESPACE_NAME
验证服务账号凭据
如果您的 Kubernetes 集群已启用 Workload Identity Federation for GKE,则可以跳过此部分。
在 GKE 上运行时,Managed Service for Prometheus 会自动根据 Compute Engine 默认服务账号从环境中检索凭据。默认情况下,默认服务账号具有必要的权限 monitoring.metricWriter
和 monitoring.viewer
。如果您未使用 Workload Identity Federation for GKE,并且之前从默认节点服务账号中移除了任一角色,则必须重新添加这些缺少的权限,然后才能继续。
如果您不在 GKE 上运行,请参阅明确提供凭据。
为 Workload Identity Federation for GKE 配置服务账号
如果您的 Kubernetes 集群未启用 Workload Identity Federation for GKE,则可以跳过此部分。
Managed Service for Prometheus 使用 Cloud Monitoring API 捕获指标数据。如果您的集群使用的是 Workload Identity Federation for GKE,则必须向您的 Kubernetes 服务账号授予 Monitoring API 权限。本节介绍以下内容:
- 创建专用 Google Cloud 服务账号
gmp-test-sa
。 - 将 Google Cloud 服务账号绑定到测试命名空间
NAMESPACE_NAME
中的默认 Kubernetes 服务账号。 - 为 Google Cloud 服务账号授予必要的权限。
创建和绑定服务账号
此步骤显示在 Managed Service for Prometheus 文档中的多个位置。如果您在执行先前的任务时已经执行此步骤,则无需重复执行。请直接跳到向服务账号授权。
以下命令序列会创建 gmp-test-sa
服务账号并将其绑定到 NAMESPACE_NAME
命名空间中的默认 Kubernetes 服务账号:
gcloud config set project PROJECT_ID \ && gcloud iam service-accounts create gmp-test-sa \ && gcloud iam service-accounts add-iam-policy-binding \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE_NAME/default]" \ gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ && kubectl annotate serviceaccount \ --namespace NAMESPACE_NAME \ default \ iam.gke.io/gcp-service-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
如果您使用的是其他 GKE 命名空间或服务账号,请适当调整命令。
向服务账号授权
相关权限组已收集到多个角色中,您可以将这些角色授予主账号(在此示例中为 Google Cloud 服务账号)。如需详细了解 Monitoring 角色,请参阅访问权限控制。
以下命令会向 Google Cloud 服务账号 gmp-test-sa
授予写入指标数据所需的 Monitoring API 角色。
如果您在执行先前的任务时已经为 Google Cloud 服务账号授予了特定角色,则无需再次执行此操作。
gcloud projects add-iam-policy-binding PROJECT_ID\ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.metricWriter
调试 Workload Identity Federation for GKE 配置
如果您在使 Workload Identity Federation for GKE 正常工作时遇到问题,请参阅验证 Workload Identity Federation for GKE 设置的文档和 Workload Identity Federation for GKE 故障排除指南。
由于拼写错误和部分复制粘贴是配置 Workload Identity Federation for GKE 时最常见的错误来源,因此我们强烈建议使用这些说明中代码示例中嵌入的可编辑变量和可点击复制粘贴图标。
生产环境中的 Workload Identity Federation for GKE
本文档中描述的示例将 Google Cloud 服务账号绑定到默认 Kubernetes 服务账号,并授予 Google Cloud 服务账号使用 Monitoring API 所需的所有权限。
在生产环境中,您可能需要使用更精细的方法,其中每个组件对应一个服务账号,并且每个服务账号都具有最小的权限。如需详细了解如何为工作负载身份管理配置服务账号,请参阅使用 Workload Identity Federation for GKE。
设置 OpenTelemetry 收集器
本部分将指导您设置和使用 OpenTelemetry 收集器从示例应用中爬取指标,并将数据发送到 Google Cloud Managed Service for Prometheus。如需了解详细的配置信息,请参阅以下部分:
OpenTelemetry 收集器类似于 Managed Service for Prometheus 代理二进制文件。OpenTelemetry 社区会定期发布版本,包括源代码、二进制文件和容器映像。
您可以使用最佳实践默认值在虚拟机或 Kubernetes 集群上部署这些工件,也可以使用收集器构建器构建仅包含所需组件的收集器。如需构建可与 Managed Service for Prometheus 搭配使用的收集器,您需要以下组件:
- Managed Service for Prometheus 导出器, 这会将您的指标写入 Managed Service for Prometheus。
- 用于爬取指标的接收器。本文档假定您使用的是 OpenTelemetry Prometheus 接收器,但 Managed Service for Prometheus 导出器与任何 OpenTelemetry 指标接收器兼容。
- 处理器,用于批量处理和标记指标,以便根据您的环境添加重要的资源标识符。
这些组件是通过使用配置文件启用的,该配置文件会通过 --config
标志传递给 Collector。
以下各部分将更详细地讨论如何配置每个组件。本文档介绍如何在 GKE 和其他位置运行收集器。
配置和部署收集器
无论您是在 Google Cloud 上还是在其他环境中执行收集操作,都可以将 OpenTelemetry 收集器配置为导出到 Managed Service for Prometheus。最大的区别在于如何配置收集器。在非 Google Cloud 环境中,可能需要进行额外的指标数据格式设置,以便与 Managed Service for Prometheus 兼容。但在 Google Cloud 上 收集器可以自动检测格式。
在 GKE 上运行 OpenTelemetry 收集器
您可以将以下配置复制到名为 config.yaml
的文件中,以在 GKE 上设置 OpenTelemetry Collector:
receivers: prometheus: config: scrape_configs: - job_name: 'SCRAPE_JOB_NAME' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name] action: keep regex: prom-example - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(?:\d+);(\d+) replacement: $$1:$$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) processors: resourcedetection: detectors: [gcp] timeout: 10s transform: # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and # metrics containing these labels will be rejected. Prefix them with exported_ to prevent this. metric_statements: - context: datapoint statements: - set(attributes["exported_location"], attributes["location"]) - delete_key(attributes, "location") - set(attributes["exported_cluster"], attributes["cluster"]) - delete_key(attributes, "cluster") - set(attributes["exported_namespace"], attributes["namespace"]) - delete_key(attributes, "namespace") - set(attributes["exported_job"], attributes["job"]) - delete_key(attributes, "job") - set(attributes["exported_instance"], attributes["instance"]) - delete_key(attributes, "instance") - set(attributes["exported_project_id"], attributes["project_id"]) - delete_key(attributes, "project_id") batch: # batch metrics before sending to reduce API usage send_batch_max_size: 200 send_batch_size: 200 timeout: 5s memory_limiter: # drop metrics if memory usage gets too high check_interval: 1s limit_percentage: 65 spike_limit_percentage: 20 # Note that the googlemanagedprometheus exporter block is intentionally blank exporters: googlemanagedprometheus: service: pipelines: metrics: receivers: [prometheus] processors: [batch, memory_limiter, resourcedetection, transform] exporters: [googlemanagedprometheus]
上述配置使用 Prometheus 接收器和 Managed Service for Prometheus 导出器来爬取 Kubernetes Pod 上的指标端点,并将这些指标导出到 Managed Service for Prometheus。流水线处理器会对数据进行格式化和批量处理。
如需详细了解此配置的每个部分的功能以及不同平台的配置,请参阅下面有关爬取指标和添加处理器的详细部分。
将现有 Prometheus 配置与 OpenTelemetry 搭配使用时
收集器的 prometheus
接收器,将任何 $
字符替换为 $$
to avoid
triggering environment variable substitution. For more information, see
Scrape Prometheus metrics.
You can modify this config based on your environment, provider, and the metrics you want to scrape, but the example config is a recommended starting point for running on GKE.
Run the OpenTelemetry Collector outside Google Cloud
Running the OpenTelemetry Collector outside Google Cloud, such as on-premises or on other cloud providers, is similar to running the Collector on GKE. However, the metrics you scrape are less likely to automatically include data that best formats it for Managed Service for Prometheus. Therefore, you must take extra care to configure the collector to format the metrics so they are compatible with Managed Service for Prometheus.
You can the following config into a file called config.yaml
to set up the
OpenTelemetry Collector for deployment on a non-GKE Kubernetes
cluster:
receivers: prometheus: config: scrape_configs: - job_name: 'SCRAPE_JOB_NAME' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name] action: keep regex: prom-example - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(?:\d+);(\d+) replacement: $$1:$$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) processors: resource: attributes: - key: "cluster" value: "CLUSTER_NAME" action: upsert - key: "namespace" value: "NAMESPACE_NAME" action: upsert - key: "location" value: "REGION" action: upsert transform: # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and # metrics containing these labels will be rejected. Prefix them with exported_ to prevent this. metric_statements: - context: datapoint statements: - set(attributes["exported_location"], attributes["location"]) - delete_key(attributes, "location") - set(attributes["exported_cluster"], attributes["cluster"]) - delete_key(attributes, "cluster") - set(attributes["exported_namespace"], attributes["namespace"]) - delete_key(attributes, "namespace") - set(attributes["exported_job"], attributes["job"]) - delete_key(attributes, "job") - set(attributes["exported_instance"], attributes["instance"]) - delete_key(attributes, "instance") - set(attributes["exported_project_id"], attributes["project_id"]) - delete_key(attributes, "project_id") batch: # batch metrics before sending to reduce API usage send_batch_max_size: 200 send_batch_size: 200 timeout: 5s memory_limiter: # drop metrics if memory usage gets too high check_interval: 1s limit_percentage: 65 spike_limit_percentage: 20 exporters: googlemanagedprometheus: project: "PROJECT_ID" service: pipelines: metrics: receivers: [prometheus] processors: [batch, memory_limiter, resource, transform] exporters: [googlemanagedprometheus]
This config does the following:
- Sets up a Kubernetes service discovery scrape config for Prometheus. For more information, see scraping Prometheus metrics.
- Manually sets
cluster
,namespace
, andlocation
resource attributes. For more information about resource attributes, including resource detection for Amazon EKS and Azure AKS, see Detect resource attributes. - Sets the
project
option in thegooglemanagedprometheus
exporter. For more information about the exporter, see Configure thegooglemanagedprometheus
exporter.
When using an existing Prometheus configuration with the OpenTelemetry
Collector's prometheus
receiver, replace any $
characters with $$
,以避免
来触发环境变量替换如需了解详情,请参阅爬取 Prometheus 指标。
如需了解在其他云上配置收集器的最佳实践,请参阅 Amazon EKS 或 Azure AKS。
部署示例应用
示例应用在其 metrics
端口上发出 example_requests_total
计数器指标和 example_random_numbers
直方图指标(以及其他指标)。本示例的清单定义了三个副本。
要部署示例应用,请运行以下命令:
kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.13.0/examples/example-app.yaml
将收集器配置创建为 ConfigMap
创建配置并将其放入名为 config.yaml
的文件中后,请使用该文件根据 config.yaml
文件创建 Kubernetes ConfigMap。部署收集器后,它会装载 ConfigMap 并加载文件。
如需使用您的配置创建名为 otel-config
的 ConfigMap,请使用以下代码
命令:
kubectl -n NAMESPACE_NAME create configmap otel-config --from-file config.yaml
部署收集器
创建一个包含以下内容的 collector-deployment.yaml
文件:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: NAMESPACE_NAME:prometheus-test rules: - apiGroups: [""] resources: - pods verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: NAMESPACE_NAME:prometheus-test roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: NAMESPACE_NAME:prometheus-test subjects: - kind: ServiceAccount namespace: NAMESPACE_NAME name: default --- apiVersion: apps/v1 kind: Deployment metadata: name: otel-collector spec: replicas: 1 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.106.0 args: - --config - /etc/otel/config.yaml - --feature-gates=exporter.googlemanagedprometheus.intToDouble volumeMounts: - mountPath: /etc/otel/ name: otel-config volumes: - name: otel-config configMap: name: otel-config
通过运行 以下命令:
kubectl -n NAMESPACE_NAME create -f collector-deployment.yaml
pod 启动后,会爬取示例应用,并将指标报告给 Managed Service for Prometheus。
如需了解查询数据的方法,请参阅使用 Cloud Monitoring 进行查询或使用 Grafana 进行查询。
明确提供凭据
在 GKE 上运行时,OpenTelemetry 收集器会根据节点的服务账号自动从环境中检索凭据。在非 GKE Kubernetes 集群中,必须使用标志或 GOOGLE_APPLICATION_CREDENTIALS
环境变量将凭据明确提供给 OpenTelemetry Collector。
将上下文设置为目标项目:
gcloud config set project PROJECT_ID
创建服务账号:
gcloud iam service-accounts create gmp-test-sa
此步骤会创建您可能已在 Workload Identity Federation for GKE 说明中创建的服务账号。
向服务账号授予所需权限:
gcloud projects add-iam-policy-binding PROJECT_ID\ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.metricWriter
创建并下载服务账号的密钥:
gcloud iam service-accounts keys create gmp-test-sa-key.json \ --iam-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
将密钥文件作为 Secret 添加到非 GKE 集群:
kubectl -n NAMESPACE_NAME create secret generic gmp-test-sa \ --from-file=key.json=gmp-test-sa-key.json
打开 OpenTelemetry Deployment 资源以进行修改:
kubectl -n NAMESPACE_NAME edit deployment otel-collector
将粗体显示的文本添加到资源:
apiVersion: apps/v1 kind: Deployment metadata: namespace: NAMESPACE_NAME name: otel-collector spec: template spec: containers: - name: otel-collector env: - name: "GOOGLE_APPLICATION_CREDENTIALS" value: "/gmp/key.json" ... volumeMounts: - name: gmp-sa mountPath: /gmp readOnly: true ... volumes: - name: gmp-sa secret: secretName: gmp-test-sa ...
保存该文件并关闭编辑器。应用更改后,系统会重新创建 pod 并使用给定服务账号向指标后端进行身份验证。
爬取 Prometheus 指标
本部分和后续部分提供了有关使用 OpenTelemetry Collector 的其他自定义信息。在某些情况下,此信息可能很有用,但不需要此信息也可以运行设置 OpenTelemetry 收集器中所述的示例。
如果您的应用已在公开 Prometheus 端点,则 OpenTelemetry 收集器可以使用与任何标准 Prometheus 配置搭配使用的爬取配置格式爬取这些端点。为此,请在收集器配置中启用 Prometheus 接收器。
Kubernetes pod 的简单 Prometheus 接收器配置可能如下所示:
receivers: prometheus: config: scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(?:\d+);(\d+) replacement: $$1:$$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) service: pipelines: metrics: receivers: [prometheus]
这是一个简单的基于服务发现的爬取配置,您可以根据需要进行修改以爬取应用。
将现有的 Prometheus 配置与 OpenTelemetry 收集器的 prometheus
接收器搭配使用时,请将所有 $
字符替换为 $$
to avoid
triggering environment variable substitution. This is especially important to do
for the replacement
value within your relabel_configs
section. For example,
if you have the following relabel_config
section:
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(?:\d+);(\d+) replacement: $1:$2 target_label: __address__
Then rewrite it to be:
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(?:\d+);(\d+) replacement: $$1:$$2 target_label: __address__
For more information, see the OpenTelemetry documentation.
Next, we strongly recommend that you use processors to format your metrics. In many cases, processors must be used to properly format your metrics.
Add processors
OpenTelemetry processors modify telemetry data before it is exported. You can use the processors below to ensure that your metrics are written in a format compatible with Managed Service for Prometheus.
Detect resource attributes
The Managed Service for Prometheus exporter for OpenTelemetry uses the
prometheus_target
monitored resource to uniquely identify time series data points. The exporter parses the required monitored-resource fields from resource attributes on the metric data points. The fields and the attributes from which the values are scraped are:
- project_id: auto-detected by Application Default Credentials,
gcp.project.id
, orproject
in exporter config (see configuring the exporter)- location:
location
,cloud.availability_zone
,cloud.region
- cluster:
cluster
,k8s.cluster_name
- namespace:
namespace
,k8s.namespace_name
- job:
service.name
+service.namespace
- instance:
service.instance.id
Failure to set these labels to unique values can result in "duplicate timeseries" errors when exporting to Managed Service for Prometheus.
The Prometheus receiver automatically sets the
service.name
attribute based on thejob_name
in the scrape config, andservice.instance.id
attribute based on the scrape target'sinstance
. The receiver also setsk8s.namespace.name
when usingrole: pod
in the scrape config.We recommend populating the other attributes automatically using the resource detection processor. However, depending on your environment, some attributes might not be automatically detectable. In this case, you can use other processors to either manually insert these values or parse them from metric labels. The following sections illustration configurations for doing this processing on various platforms
GKE
When running OpenTelemetry on GKE, you only need to enable the resource-detection processor to fill out the resource labels. Be sure that your metrics don't already contain any of the reserved resource labels. If this is unavoidable, see Avoid resource attribute collisions by renaming attributes.
processors: resourcedetection: detectors: [gcp] timeout: 10sThis section can be copied directly into your config file, replacing the
processors
section if it already exists.Amazon EKS
The EKS resource detector does not automatically fill in the
cluster
ornamespace
attributes. You can provide these values manually by using the resource processor, as shown in the following example: processors: resourcedetection: detectors: [eks] timeout: 10s resource: attributes: - key: "cluster" value: "my-eks-cluster" action: upsert - key: "namespace" value: "my-app" action: upsertYou can also convert these values from metric labels using the
groupbyattrs
processor (see move metric labels to resource labels below).Azure AKS
The AKS resource detector does not automatically fill in the
cluster
ornamespace
attributes. You can provide these values manually by using the resource processor, as shown in the following example: processors: resourcedetection: detectors: [aks] timeout: 10s resource: attributes: - key: "cluster" value: "my-eks-cluster" action: upsert - key: "namespace" value: "my-app" action: upsertYou can also convert these values from metric labels by using the
groupbyattrs
processor; see Move metric labels to resource labels.On-premises and non-cloud environments
With on-premises or non-cloud environments, you probably can't detect any of the necessary resource attributes automatically. In this case, you can emit these labels in your metrics and move them to resource attributes (see Move metric labels to resource labels), or manually set all of the resource attributes as shown in the following example:
processors: resource: attributes: - key: "cluster" value: "my-on-prem-cluster" action: upsert - key: "namespace" value: "my-app" action: upsert - key: "location" value: "us-east-1" action: upsertCreate your collector config as a ConfigMap describes how to use the config. That section assumes you have put your config in a file called
config.yaml
.The
project_id
resource attribute can still be automatically set when running the Collector with Application Default Credentials. If your Collector does not have access to Application Default Credentials, see Settingproject_id
.Alternatively, you can manually set the resource attributes you need in an environment variable,
OTEL_RESOURCE_ATTRIBUTES
, with a comma-separated list of key/value pairs, for example: export OTEL_RESOURCE_ATTRIBUTES="cluster=my-cluster,namespace=my-app,location=us-east-1"Then use the
env
resource detector processor to set the resource attributes: processors: resourcedetection: detectors: [env]Avoid resource attribute collisions by renaming attributes
If your metrics already contain labels that collide with the required resource attributes (such as
location
,cluster
, ornamespace
), rename them to avoid the collision. The Prometheus convention is to add the prefixexported_
to the label name. To add this prefix, use the transform processor.The following
processors
config renames any potential collisions and resolves any conflicting keys from the metric: processors: transform: # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and # metrics containing these labels will be rejected. Prefix them with exported_ to prevent this. metric_statements: - context: datapoint statements: - set(attributes["exported_location"], attributes["location"]) - delete_key(attributes, "location") - set(attributes["exported_cluster"], attributes["cluster"]) - delete_key(attributes, "cluster") - set(attributes["exported_namespace"], attributes["namespace"]) - delete_key(attributes, "namespace") - set(attributes["exported_job"], attributes["job"]) - delete_key(attributes, "job") - set(attributes["exported_instance"], attributes["instance"]) - delete_key(attributes, "instance") - set(attributes["exported_project_id"], attributes["project_id"]) - delete_key(attributes, "project_id")Move metric labels to resource labels
In some cases, your metrics might be intentionally reporting labels such as
namespace
because your exporter is monitoring multiple namespaces. For example, when running the kube-state-metrics exporter.In this scenario, these labels can be moved to resource attributes using the groupbyattrs processor:
processors: groupbyattrs: keys: - namespace - cluster - locationIn the above example, given a metric with the labels
namespace
,cluster
, and/orlocation
, those labels will be converted to the matching resource attributes.Limit API requests and memory usage
Two other processors, the batch processor and memory limiter processor allow you to limit the resource consumption of your collector.
Batch processing
Batching requests lets you define how many data points to send in a single request. Note that Cloud Monitoring has a limit of 200 time series per request. Enable the batch processor by using the following settings:
processors: batch: # batch metrics before sending to reduce API usage send_batch_max_size: 200 send_batch_size: 200 timeout: 5sMemory limiting
We recommend enabling the memory-limiter processor to prevent your collector from crashing at times of high throughput. Enable the processing by using the following settings:
processors: memory_limiter: # drop metrics if memory usage gets too high check_interval: 1s limit_percentage: 65 spike_limit_percentage: 20Configure the
googlemanagedprometheus
exporterBy default, using the
googlemanagedprometheus
exporter on GKE requires no additional configuration. For many use cases you only need to enable it with an empty block in theexporters
section: exporters: googlemanagedprometheus:However, the exporter does provide some optional configuration settings. The following sections describe the other configuration settings.
Setting
project_id
To associate your time series with a Google Cloud project, the
prometheus_target
monitored resource must haveproject_id
set.When running OpenTelemetry on Google Cloud, the Managed Service for Prometheus exporter defaults to setting this value based on the Application Default Credentials it finds. If no credentials are available, or you want to override the default project, you have two options:
- Set
project
in the exporter config- Add a
gcp.project.id
resource attribute to your metrics.We strongly recommend using the default (unset) value for
project_id
rather than explicitly setting it, when possible.Set
project
in the exporter configThe following config excerpt sends metrics to Managed Service for Prometheus in the Google Cloud project
MY_PROJECT
: receivers: prometheus: config: ... processors: resourcedetection: detectors: [gcp] timeout: 10s exporters: googlemanagedprometheus: project: MY_PROJECT service: pipelines: metrics: receivers: [prometheus] processors: [resourcedetection] exporters: [googlemanagedprometheus]The only change from previous examples is the new line
project: MY_PROJECT
. This setting is useful if you know that every metric coming through this Collector should be sent toMY_PROJECT
.Set
gcp.project.id
resource attributeYou can set project association on a per-metric basis by adding a
gcp.project.id
resource attribute to your metrics. Set the value of the attribute to the name of the project the metric should be associated with.For example, if your metric already has a label
project
, this label can be moved to a resource attribute and renamed togcp.project.id
by using processors in the Collector config, as shown in the following example: receivers: prometheus: config: ... processors: resourcedetection: detectors: [gcp] timeout: 10s groupbyattrs: keys: - project resource: attributes: - key: "gcp.project.id" from_attribute: "project" action: upsert exporters: googlemanagedprometheus: service: pipelines: metrics: receivers: [prometheus] processors: [resourcedetection, groupbyattrs, resource] exporters: [googlemanagedprometheus]Setting client options
The
googlemanagedprometheus
exporter uses gRPC clients for Managed Service for Prometheus. Therefore, optional settings are available for configuring the gRPC client:
compression
: Enables gzip compression for gRPC requests, which is useful for minimizing data transfer fees when sending data from other clouds to Managed Service for Prometheus (valid values:gzip
).user_agent
: Overrides the user-agent string sent on requests to Cloud Monitoring; only applies to metrics. Defaults to the build and version number of your OpenTelemetry Collector, for example,opentelemetry-collector-contrib 0.106.0
.endpoint
: Sets the endpoint to which metric data is going to be sent.use_insecure
: If true, uses gRPC as the communication transport. Has an effect only when theendpoint
value is not "".grpc_pool_size
: Sets the size of the connection pool in the gRPC client.prefix
: Configures the prefix of metrics sent to Managed Service for Prometheus. Defaults toprometheus.googleapis.com
. Don't change this prefix; doing so causes metrics to not be queryable with PromQL in the Cloud Monitoring UI.In most cases, you don't need to change these values from their defaults. However, you can change them to accommodate special circumstances.
All of these settings are set under a
metric
block in thegooglemanagedprometheus
exporter section, as shown in the following example: receivers: prometheus: config: ... processors: resourcedetection: detectors: [gcp] timeout: 10s exporters: googlemanagedprometheus: metric: compression: gzip user_agent: opentelemetry-collector-contrib 0.106.0 endpoint: "" use_insecure: false grpc_pool_size: 1 prefix: prometheus.googleapis.com service: pipelines: metrics: receivers: [prometheus] processors: [resourcedetection] exporters: [googlemanagedprometheus]What's next
- Use PromQL in Cloud Monitoring to query Prometheus metrics.
- Use Grafana to query Prometheus metrics.
- Set up the OpenTelemetry Collector as a sidecar agent in Cloud Run.
The Cloud Monitoring Metrics Management page provides information that can help you control the amount you spend on billable metrics without affecting observability. The Metrics Management page reports the following information:
- Ingestion volumes for both byte- and sample-based billing, across metric domains and for individual metrics.
- Data about labels and cardinality of metrics.
- Number of reads for each metric.
- Use of metrics in alerting policies and custom dashboards.
- Rate of metric-write errors.
You can also use the Metrics Management to exclude unneeded metrics, eliminating the cost of ingesting them. For more information about the Metrics Management page, see View and manage metric usage.