Deploy Google-Built OpenTelemetry Collector on Google Kubernetes Engine

This document describes how to run the Google-Built OpenTelemetry Collector on Google Kubernetes Engine to collect OTLP logs, metrics, and traces from instrumented applications and then export that data to Google Cloud.

Before you begin

Running the Google-Built OpenTelemetry Collector requires the following resources:

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Telemetry, Cloud Logging, Cloud Monitoring, and Cloud Trace APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Telemetry, Cloud Logging, Cloud Monitoring, and Cloud Trace APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

A Kubernetes cluster. If you don't have a Kubernetes cluster, then follow the instructions in the Quickstart for GKE.
The following command-line tools:
- gcloud
- kubectl
The gcloud and kubectl tools are part of the Google Cloud CLI. For information about installing them, see Managing Google Cloud CLI components. To see the gcloud CLI components you have installed, run the following command:
```
        gcloud components list
        
```

Configure permissions for the Collector

If you have disabled GKE workload identity, then you can skip this section.

To ensure that the OpenTelemetry Collector's kubernetes service account has the necessary permissions to export telemetry, ask your administrator to grant the OpenTelemetry Collector's kubernetes service account the following IAM roles on your project:

Monitoring Metric Writer (roles/monitoring.metricWriter)
Logging Log Writer (roles/logging.logWriter)
Cloud Trace Agent (roles/cloudtrace.agent)

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give the OpenTelemetry Collector's kubernetes service account the required permissions through custom roles or other predefined roles.

To configure the permissions, use the following add-iam-policy-binding commands:

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/logging.logWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/monitoring.metricWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/cloudtrace.agent \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector

Before running the commands, replace the following variables:

PROJECT_ID: The identifier of the project.
PROJECT_NUMBER: The Google Cloud project number.

Deploy the Collector

The Collector pipeline can be deployed directly from the vetted examples provided by the Self-Managed OTLP Kubernetes Ingestion repo. You can deploy directly from GitHub with the following commands after replacing PROJECT_ID with the ID of your Google Cloud project:

export GOOGLE_CLOUD_PROJECT=PROJECT_ID
export PROJECT_NUMBER=PROJECT_NUMBER
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest.git/k8s/base | envsubst | kubectl apply -f -

Before running the commands, replace the following variables:

PROJECT_ID: The identifier of the project.
PROJECT_NUMBER: The numeric identifier of the project.

Configure the Collector

We provide an OpenTelemetry Collector configuration for you to use with the Google-built Collector. This configuration is designed to deliver high volumes of OTLP metrics, logs, and traces with consistent GKE and Kubernetes metadata attached. This configuration is also designed to prevent common ingestion issues. You can add to the configuration, but we strongly recommend that you don't remove elements.

This section describes the provided configuration, the key components like exporters, processors, receivers, and other available components.

Provided Collector configuration

You can find the Collector configuration for Kubernetes environments in the otlp-k8s-ingest repository:

# # # # # # # # #

The Collector configuration tinclude-code"> suppresswarning="suppresswarning" translate="no" class="devsite-click-to-copy" track-metadata-position="GoogleCloudPlatform/otlp-k8s-ingest/config/collector.yaml/HEAD/" data-code-snippet="true" data-github-includecode-link="https://github.com/GoogleCloudPlatform/otlp-k8s-ingest/blob/HEAD/config/collector.yaml" language="yaml" data-github-path="GoogleCloudPlatform/otlp-k8s-ingest/config/collector.yaml" data-git-revision="HEAD" dir="ltr" is-upgraded syntax="YAML"># Copyright 2024 Google LLC ># Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at ># http://www.apache.org/licenses/LICENSE-2.0 ># Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. >exporters: googlecloud: log: default_log_name: opentelemetry-collector user_agent: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64) googlemanagedprometheus: user_agent: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64) # The otlphttp exporter is used to send traces to Google Cloud Trace using OTLP http/proto # The otlp exporter could also be used to send them using OTLP grpc otlphttp: encoding: proto endpoint: https://telemetry.googleapis.com # Use the googleclientauth extension to authenticate with Google credentials auth: authenticator: googleclientauth >extensions: health_check: endpoint: ${env:MY_POD_IP}:13133 googleclientauth: >processors: filter/self-metrics: metrics: include: match_type: strict metric_names: - otelcol_process_uptime - otelcol_process_memory_rss - otelcol_grpc_io_client_completed_rpcs - otelcol_googlecloudmonitoring_point_count batch: send_batch_max_size: 200 send_batch_size: 200 timeout: 5s k8sattributes: extract: metadata: - k8s.namespace.name - k8s.deployment.name - k8s.statefulset.name - k8s.daemonset.name - k8s.cronjob.name - k8s.job.name - k8s.replicaset.name - k8s.node.name - k8s.pod.name - k8s.pod.uid - k8s.pod.start_time passthrough: false pod_association: - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: k8s.pod.uid - sources: - from: connection memory_limiter: check_interval: 1s limit_percentage: 65 spike_limit_percentage: 20 metricstransform/self-metrics: transforms: - action: update include: otelcol_process_uptime operations: - action: add_label new_label: version new_value: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64) resourcedetection: detectors: [gcp] timeout: 10s transform/collision: metric_statements: - context: datapoint statements: - set(attributes["exported_location"], attributes["location"]) - delete_key(attributes, "location") - set(attributes["exported_cluster"], attributes["cluster"]) - delete_key(attributes, "cluster") - set(attributes["exported_namespace"], attributes["namespace"]) - delete_key(attributes, "namespace") - set(attributes["exported_job"], attributes["job"]) - delete_key(attributes, "job") - set(attributes["exported_instance"], attributes["instance"]) - delete_key(attributes, "instance") - set(attributes["exported_project_id"], attributes["project_id"]) - delete_key(attributes, "project_id") # The relative ordering of statements between ReplicaSet & Deployment and Job & CronJob are important. # The ordering of these controllers is decided based on the k8s controller documentation available at # https://kubernetes.io/docs/concepts/workloads/controllers. # The relative ordering of the other controllers in this list is inconsequential since they directly # create pods. transform/aco-gke: metric_statements: - context: datapoint statements: - set(attributes["top_level_controller_type"], "ReplicaSet") where resource.attributes["k8s.replicaset.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.replicaset.name"]) where resource.attributes["k8s.replicaset.name"] != nil - set(attributes["top_level_controller_type"], "Deployment") where resource.attributes["k8s.deployment.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.deployment.name"]) where resource.attributes["k8s.deployment.name"] != nil - set(attributes["top_level_controller_type"], "DaemonSet") where resource.attributes["k8s.daemonset.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.daemonset.name"]) where resource.attributes["k8s.daemonset.name"] != nil - set(attributes["top_level_controller_type"], "StatefulSet") where resource.attributes["k8s.statefulset.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.statefulset.name"]) where resource.attributes["k8s.statefulset.name"] != nil - set(attributes["top_level_controller_type"], "Job") where resource.attributes["k8s.job.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.job.name"]) where resource.attributes["k8s.job.name"] != nil - set(attributes["top_level_controller_type"], "CronJob") where resource.attributes["k8s.cronjob.name"] != nil - set(attributes["top_level_controller_name"], resource.attributes["k8s.cronjob.name"]) where resource.attributes["k8s.cronjob.name"] != nil # When sending telemetry to the GCP OTLP endpoint, the gcp.project_id resource attribute is required to be set to your project ID. resource/gcp_project_id: attributes: - key: gcp.project_id # MAKE SURE YOU REPLACE THIS WITH YOUR PROJECT ID value: ${GOOGLE_CLOUD_PROJECT} action: insert # The metricstarttime processor is important to include if you are using the prometheus receiver to ensure the start time is set properly. # It is a no-op otherwise. metricstarttime: strategy: subtract_initial_point >receivers: # This collector is configured to accept OTLP metrics, logs, and traces, and is designed to receive OTLP from workloads running in the cluster. otlp: protocols: grpc: endpoint: ${env:MY_POD_IP}:4317 http: cors: allowed_origins: - http://* - https://* endpoint: ${env:MY_POD_IP}:4318 otlp/self-metrics: protocols: grpc: endpoint: ${env:MY_POD_IP}:14317 >service: extensions: - health_check - googleclientauth pipelines: logs: exporters: - googlecloud processors: - k8sattributes - resourcedetection - memory_limiter - batch receivers: - otlp metrics/otlp: exporters: - googlemanagedprometheus processors: - k8sattributes - memory_limiter - metricstarttime - resourcedetection - transform/collision - transform/aco-gke - batch receivers: - otlp metrics/self-metrics: exporters: - googlemanagedprometheus processors: - filter/self-metrics - metricstransform/self-metrics - k8sattributes - memory_limiter - resourcedetection - batch receivers: - otlp/self-metrics traces: exporters: - otlphttp processors: - k8sattributes - memory_limiter - resource/gcp_project_id - resourcedetection - batch receivers: - otlp telemetry: logs: encoding: json metrics: readers: - periodic: exporter: otlp: protocol: grpc endpoint: ${env:MY_POD_IP}:14317 data-text="Exporters" tabindex="-1">Exporters

 includes the following exporters:


googlecloud
exporter, for logs and traces. This exporter is configured with a default
log name.
googlemanagedprometheus
exporter, for metrics. This exporter does not require any configuration,
but there are configuration options. For information about configuration
options for the googlemanagedprometheus exporter, see
Get started with the
OpenTelemetry Collector
in the Google Cloud Managed Service for Prometheus documentation.


Processors

The Collector configuration includes the following processors:


batch:
Configured to batch telemetry requests at the Google Cloud maximum number of
entries per request, or at the Google Cloud minimum interval of every 5
seconds (whichever comes first).
memory_limiter:
Caps the Collector's memory usage to prevent out-of-memory crashes by
dropping data points when the limit is exceeded.
resourcedetection:
Automatically detects Google Cloud resource labels such as project_id and cluster_name.

k8sattributes:
Automatically maps Kubernetes resource attributes to telemetry labels.
transform:
Renames metric labels that collide with labels on Google Cloud
monitored resources.



Receivers

The Collector configuration includes only the
otlp receiver.
For information about instrumenting your applications to send OTLP traces
and metrics to the Collector's OTLP endpoint, see
Choose an instrumentation
approach.

Available components

The Google-Built OpenTelemetry Collector contains the components that most users will need to
enable a rich experience within Google Cloud Observability. For a complete list of
available components, see
Components
in the opentelemetry-operations-collector repository.

To request any changes or additions to the available components,
open a feature request.
in the opentelemetry-operations-collector repository.

Generate telemetry

This section describes deploying a sample application and pointing that
application to the Collector's OTLP endpoint, and viewing the telemetry in
Google Cloud. The sample application is a small generator that exports traces,
logs, and metrics to the Collector.

If you already have an application instrumented with an OpenTelemetry SDK, then
you can point your application to the Collector's endpoint instead.

To deploy the sample application, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/main/sample/app.yaml


To point existing applications that use the OpenTelemetry SDK at the Collector's
endpoint, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable to
http://opentelemetry-collector.opentelemetry.svc.cluster.local:4317.

After a few minutes, telemetry generated by the application begins flowing
through the Collector to the Google Cloud console for each signal.

View telemetry

The Google-Built OpenTelemetry Collector sends metrics, logs, and traces from your instrumented
applications to Google Cloud Observability. The Collector also sends self-observability
metrics. The following sections describe how to view this telemetry.





View your metrics


The Google-Built OpenTelemetry Collector collects Prometheus metrics that you can view by using
the Metrics Explorer. The metrics collected depend
on the instrumentation of the app, although the Google-built Collector also
writes some self-metrics.












To view the metrics collected by the Google-Built OpenTelemetry Collector,
do the following:



  



In the Google Cloud console, go to the
   leaderboard Metrics explorer page:
   
Go to Metrics explorer


If you use the search bar to find this page, then select the result whose subheading is
Monitoring.
  In the toolbar of the Google Cloud console, select your Google Cloud project.
      For App Hub configurations, select the
      App Hub host project or the app-enabled folder's management project.
  In the Metric element, expand the Select a metric menu,
    enter Prometheus Target
    in the filter bar, and then use the submenus to select a specific resource type and metric:
    
    
      In the Active resources menu, select Prometheus Target.
      To select a metric, use the Active metric categories and Active metrics menus.
        
        Metrics collected by the Google-Built OpenTelemetry Collector have the
prefix prometheus.googleapis.com.
        
      
      Click Apply.
    
    
  
  
  To add filters, which remove time series from the query results, use the
      Filter element.
  
  
  
  Configure how the data is viewed. 
When the measurements for a metric are
cumulative, Metrics Explorer  automatically normalizes the measured data by
the alignment period, which results in the chart displaying a rate. For
more information, see Kinds, types, and conversions.
When integer or double values are measured, such as with
counter metrics, Metrics Explorer automatically sums all time series.
To change this behavior, set the first menu of the Aggregation entry
to None.
     For more information about configuring a chart, see
      Select metrics when using Metrics Explorer.
  
  



View your traces


To view your trace data, do the following:




In the Google Cloud console, go to the Trace explorer page:
   
Go to Trace explorer


You can also find this page by using the search bar.
In the toolbar of the Google Cloud console,
select your Google Cloud project. For App Hub
configurations, select the App Hub host project or management project.

In the table section of the page, select a row.
In the Gantt chart on the Trace details panel,
select a span.

A panel opens that displays information about the traced request. These
details include the method, status code, number of bytes, and the
user agent of the caller.
To view the logs associated with this trace,
select the Logs & Events tab.

The tab shows individual logs. To view the details of the log entry,
expand the log entry. You can also click View Logs and view the log
by using the Logs Explorer.


For more information about using the Cloud Trace explorer, see
Find and explore traces.

View your logs


From the Logs Explorer, you can inspect your logs, and you can also
view associated traces, when they exist.




In the Google Cloud console, go to the Logs Explorer page:
   
Go to Logs Explorer


If you use the search bar to find this page, then select the result whose subheading is
Logging.
Locate a log entry from your instrumented app.
To view the details, expand the log entry.
Click  Traces on a log entry with a trace
message, and then select View trace details.

A Trace details panel opens and displays the selected trace.


For more information about using the Logs Explorer, see
View logs by using the Logs Explorer.



Observe and debug the Collector


The Google-Built OpenTelemetry Collector automatically provides self-observability metrics to
help you monitor its performance and ensure continued uptime of the OTLP
ingestion pipeline.

To monitor the Collector, install the sample dashboard for the Collector. This
dashboard offers at-a-glance insights into several metrics from the Collector,
including uptime, memory usage, and API calls to Google Cloud Observability.

To install the dashboard, do the following:













In the Google Cloud console, go to the  Dashboards page:
   
Go to Dashboards


If you use the search bar to find this page, then select the result whose subheading is
Monitoring.
Click Dashboard Templates.
Search for the OpenTelemetry Collector dashboard.
Optional: To preview the dashboard, select it.
Click playlist_add Add dashboard to your list and
then complete the dialog.

The dialog lets you select the name of the dashboard,
and add labels to the dashboard.


For more information about installing dashboards, see Install a dashboard
template.