Deploy the OpenTelemetry Collector on Google Kubernetes Engine

This document shows how to run the OpenTelemetry Collector in a GKE cluster to collect OTLP logs, metrics, and traces from instrumented applications and export that data to Google Cloud.

Before you begin

Running the OpenTelemetry Collector on GKE requires the following resources:

A Google Cloud project with the Cloud Monitoring API, Cloud Trace API, and Cloud Logging API enabled.
- If you don't have a Google Cloud project, then do the following:
  1. In the Google Cloud console, go to New Project:
    
    Create a New Project
  2. In the Project Name field, enter a name for your project and then click Create.
  3. Go to Billing:
    
    Go to Billing
  4. Select the project you just created if it isn't already selected at the top of the page.
  5. You are prompted to choose an existing payments profile or to create a new one.
  The Monitoring API, Trace API, and Logging API are enabled by default for new projects.
- If you already have a Google Cloud project, then ensure that the Monitoring API, Trace API, and Logging API is enabled:
  1. Go to APIs & services:
    
    Go to APIs & services
  2. Select your project.
  3. Click Enable APIs and services.
  4. Search for each API by name.
  5. In the search results, click the named API. The Monitoring API appears as "Stackdriver Monitoring API".
  6. If "API enabled" is not displayed, then click the Enable button.
A Kubernetes cluster. If you don't have a Kubernetes cluster, then follow the instructions in the Quickstart for GKE.
The following command-line tools:
- gcloud
- kubectl
The gcloud and kubectl tools are part of the Google Cloud CLI. For information about installing them, see Managing Google Cloud CLI components. To see the gcloud CLI components you have installed, run the following command:
```
gcloud components list
```

Configure Permissions for the Collector

If you have disabled GKE workload identity, then you can skip this section.

To ensure that the OpenTelemetry collector's kubernetes service account has the necessary permissions to export telemetry, ask your administrator to grant the OpenTelemetry collector's kubernetes service account the following IAM roles on your project:

Monitoring Metric Writer (roles/monitoring.metricWriter)
Logging Log Writer (roles/logging.logWriter)
Trace Agent (roles/cloutrace.agent)

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give the OpenTelemetry collector's kubernetes service account the required permissions through custom roles or other predefined roles.

To configure the permissions, use the add-iam-policy-binding command:

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/logging.logWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/monitoring.metricWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/cloudtrace.agent \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector

Before running the command, update the following fields:

PROJECT_ID: The identifier of the project.
PROJECT_NUMBER: The Google Cloud project number.

Deploy the Collector

The Collector pipeline can be deployed directly from GitHub with the following commands after replacing PROJECT_ID with the ID of your Google Cloud project:

export GCLOUD_PROJECT=PROJECT_ID
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest.git/k8s/base | envsubst | kubectl apply -f -

Observe and debug the Collector

The OpenTelemetry Collector provides self-observability metrics out of the box to help you monitor its performance and ensure continued uptime of the OTLP ingestion pipeline.

To monitor the Collector, install the sample dashboard for the Collector. This dashboard offers at-a-glance insights into several metrics from the Collector, including uptime, memory usage, and API calls to Google Cloud Observability.

To install the dashboard, do the following:

In the Google Cloud console, go to the Dashboards page:
Go to Dashboards

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
Select the Sample Library tab.
Select the OpenTelemetry Collector category.
Select the "OpenTelemtry Collector" dashboard.
Click Import.

For more information about the installation process, see Install sample dashboards.

Configure the Collector

The self-managed OTLP ingest pipeline includes a default OpenTelemetry Collector configuration that is designed to deliver high volumes of OTLP metrics, logs, and traces with consistent GKE and Kubernetes metadata attached. It's also designed to prevent common ingestion issues.

However, you might have unique needs that require customization of the default config. This section describes the defaults shipped with the pipeline and how you can customize those defaults to fit your needs.

The default Collector configuration is located on GitHub as config/collector.yaml:

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exporters:
  googlecloud:
    log:
      default_log_name: opentelemetry-collector
    user_agent: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.118.0
  googlemanagedprometheus:
    user_agent: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.118.0

extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  filter/self-metrics:
    metrics:
      include:
        match_type: strict
        metric_names:
        - otelcol_process_uptime
        - otelcol_process_memory_rss
        - otelcol_grpc_io_client_completed_rpcs
        - otelcol_googlecloudmonitoring_point_count
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  metricstransform/self-metrics:
    transforms:
    - action: update
      include: otelcol_process_uptime
      operations:
      - action: add_label
        new_label: version
        new_value: Google-Cloud-OTLP manifests:0.1.0 otel/opentelemetry-collector-contrib:0.118.0

  # We need to add the pod IP as a resource label so the k8s attributes processor can find it.
  resource/self-metrics:
    attributes:
    - action: insert
      key: k8s.pod.ip
      value: ${env:MY_POD_IP}

  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        cors:
          allowed_origins:
          - http://*
          - https://*
        endpoint: ${env:MY_POD_IP}:4318
  prometheus/self-metrics:
    config:
      scrape_configs:
      - job_name: otel-self-metrics
        scrape_interval: 1m
        static_configs:
        - targets:
          - ${env:MY_POD_IP}:8888

service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - resourcedetection
      - memory_limiter
      - batch
      receivers:
      - otlp
    metrics/otlp:
      exporters:
      - googlemanagedprometheus
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - transform/collision
      - batch
      receivers:
      - otlp
    metrics/self-metrics:
      exporters:
      - googlemanagedprometheus
      processors:
      - filter/self-metrics
      - metricstransform/self-metrics
      - resource/self-metrics
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - prometheus/self-metrics
    traces:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - otlp
  telemetry:
    metrics:
      address: ${env:MY_POD_IP}:8888

Exporters

The default exporters include the googlecloud exporter (for logs and traces) and the googlemanagedprometheus exporter (for metrics).

The googlecloud exporter is configured with a default log name. The googlemanagedprometheus exporter does not require any default configuration; see Get started with the OpenTelemetry Collector in the Google Cloud Managed Service for Prometheus documentation for more information on configuring this exporter.

Processors

The default configuration includes the following processors:

batch: Configured to batch telemetry requests at the Google Cloud maximum number of entries per request, or at the Google Cloud minimum interval of every 5 seconds (whichever comes first).
k8sattributes: Automatically maps Kubernetes resource attributes to telemetry labels.
memory_limiter: Caps Collector memory usage at a reasonable level to prevent out-of-memory crashes by dropping data points beyond this level.
resourcedetection: Automatically detects Google Cloud resource labels such as cluster name and project ID.
transform: Renames metric labels which would collide with Google Cloud monitored resource fields.

Receivers

The default configuration only includes the otlp receiver. See Choose an instrumentation approach for detailed instructions on instrumenting your applications to push OTLP traces and metrics to the Collector's OTLP endpoint.

Next steps: collect and view telemetry

This section describes deploying a sample application and pointing that application to the Collector's OTLP endpoint, and viewing the telemetry in Google Cloud. The sample application is a small generator that exports traces, logs, and metrics to the Collector.

If you already have an application instrumented with an OpenTelemetry SDK, then you can point your application to the Collector's endpoint instead.

To deploy the sample application, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/main/sample/app.yaml

To point existing applications that use the OpenTelemetry SDK at the Collector's endpoint, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable to http://opentelemetry-collector.opentelemetry.svc.cluster.local:4317.

After a few minutes, telemetry generated by the application begins flowing through the Collector to the Google Cloud console for each signal.