Write OTLP metrics by using an OpenTelemetry sidecar


This tutorial shows how to write, deploy, and call a Cloud Run service that reports custom OTLP metrics to Google Cloud Managed Service For Prometheus by using the OpenTelemetry sidecar.

If you have a Cloud Run service that reports Prometheus metrics, then use the Prometheus sidecar for Cloud Run instead.

Objectives

  • Write, build, and deploy a service to Cloud Run with the OpenTelemetry sidecar.
  • Generate custom metrics and report them to Google Cloud Managed Service For Prometheus.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Run, Cloud Monitoring, Artifact Registry, and Cloud Build APIs.

    Enable the APIs

  7. Install and initialize the gcloud CLI.
  8. Update Google Cloud CLI: gcloud components update

Required roles

To get the permissions that you need to complete the tutorial, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Also note that the Cloud Run service account needs the Monitoring Metric Writer (roles/monitoring.metricWriter) role. The Compute Engine default service account has this role by default, but you might need to add it if you have changed its permissions or are using a different service account.

Setting up gcloud defaults

To configure gcloud with defaults for your Cloud Run service:

  1. Set your default project:

    gcloud config set project PROJECT_ID

    Replace PROJECT_ID with the name of the project you created for this tutorial.

  2. Configure gcloud for your chosen region:

    gcloud config set run/region REGION

    Replace REGION with the supported Cloud Run region of your choice.

Cloud Run locations

Cloud Run is regional, which means the infrastructure that runs your Cloud Run services is located in a specific region and is managed by Google to be redundantly available across all the zones within that region.

Meeting your latency, availability, or durability requirements are primary factors for selecting the region where your Cloud Run services are run. You can generally select the region nearest to your users but you should consider the location of the other Google Cloud products that are used by your Cloud Run service. Using Google Cloud products together across multiple locations can affect your service's latency as well as cost.

Cloud Run is available in the following regions:

Subject to Tier 1 pricing

  • asia-east1 (Taiwan)
  • asia-northeast1 (Tokyo)
  • asia-northeast2 (Osaka)
  • europe-north1 (Finland) leaf icon Low CO2
  • europe-southwest1 (Madrid)
  • europe-west1 (Belgium) leaf icon Low CO2
  • europe-west4 (Netherlands)
  • europe-west8 (Milan)
  • europe-west9 (Paris) leaf icon Low CO2
  • me-west1 (Tel Aviv)
  • us-central1 (Iowa) leaf icon Low CO2
  • us-east1 (South Carolina)
  • us-east4 (Northern Virginia)
  • us-east5 (Columbus)
  • us-south1 (Dallas)
  • us-west1 (Oregon) leaf icon Low CO2

Subject to Tier 2 pricing

  • africa-south1 (Johannesburg)
  • asia-east2 (Hong Kong)
  • asia-northeast3 (Seoul, South Korea)
  • asia-southeast1 (Singapore)
  • asia-southeast2 (Jakarta)
  • asia-south1 (Mumbai, India)
  • asia-south2 (Delhi, India)
  • australia-southeast1 (Sydney)
  • australia-southeast2 (Melbourne)
  • europe-central2 (Warsaw, Poland)
  • europe-west10 (Berlin)
  • europe-west12 (Turin)
  • europe-west2 (London, UK) leaf icon Low CO2
  • europe-west3 (Frankfurt, Germany) leaf icon Low CO2
  • europe-west6 (Zurich, Switzerland) leaf icon Low CO2
  • me-central1 (Doha)
  • me-central2 (Dammam)
  • northamerica-northeast1 (Montreal) leaf icon Low CO2
  • northamerica-northeast2 (Toronto) leaf icon Low CO2
  • southamerica-east1 (Sao Paulo, Brazil) leaf icon Low CO2
  • southamerica-west1 (Santiago, Chile) leaf icon Low CO2
  • us-west2 (Los Angeles)
  • us-west3 (Salt Lake City)
  • us-west4 (Las Vegas)

If you already created a Cloud Run service, you can view the region in the Cloud Run dashboard in the Google Cloud console.

Creating a Artifact Registry image repository

Create an Artifact Registry Docker repository to host the sample service image:

gcloud artifacts repositories create run-otel \
    --repository-format=docker \
    --location=REGION \
    --project=PROJECT_ID

Replace the following:

  • PROJECT_ID with the name of the project you created for this tutorial.
  • REGION REGION with the supported Cloud Run region of your choice.

Retrieving the code sample

To retrieve the code sample for use:

  1. Clone the sample app repository to your local machine:

    Go

    git clone https://github.com/GoogleCloudPlatform/golang-samples.git

    Alternatively, you can download the sample as a zip file and extract it.

  2. Change to the directory that contains the Cloud Run sample code:

    Go

    cd golang-samples/run/custom-metrics/

Reviewing the code

The code for this tutorial consists of the following:

  • A server that handles incoming requests and generates a metric named sidecar_sample_counter.
package main

import (
	"context"
	"fmt"
	"log"
	"net/http"
	"os"

	"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc"
	"go.opentelemetry.io/otel/metric"
	sdkmetric "go.opentelemetry.io/otel/sdk/metric"
	"go.opentelemetry.io/otel/sdk/resource"
	semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
)

var counter metric.Int64Counter

func main() {
	ctx := context.Background()
	shutdown := setupCounter(ctx)
	defer shutdown(ctx)

	port := os.Getenv("PORT")
	if port == "" {
		port = "8080"
		log.Printf("defaulting to port %s", port)
	}

	http.HandleFunc("/", handler)
	log.Fatal(http.ListenAndServe(":"+port, nil))
}

func handler(w http.ResponseWriter, r *http.Request) {
	counter.Add(context.Background(), 100)
	fmt.Fprintln(w, "Incremented sidecar_sample_counter metric!")
}

func setupCounter(ctx context.Context) func(context.Context) error {
	serviceName := os.Getenv("K_SERVICE")
	if serviceName == "" {
		serviceName = "sample-cloud-run-app"
	}
	r, err := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		),
	)
	if err != nil {
		log.Fatalf("Error creating resource: %v", err)
	}

	exporter, err := otlpmetricgrpc.New(ctx,
		otlpmetricgrpc.WithInsecure(),
	)
	if err != nil {
		log.Fatalf("Error creating exporter: %s", err)
	}
	provider := sdkmetric.NewMeterProvider(
		sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exporter)),
		sdkmetric.WithResource(r),
	)

	meter := provider.Meter("example.com/metrics")
	counter, err = meter.Int64Counter("sidecar-sample-counter")
	if err != nil {
		log.Fatalf("Error creating counter: %s", err)
	}
	return provider.Shutdown
}
  • A Dockerfile that defines the operating environment for the service.
FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o sample-app

FROM alpine:3
RUN apk add --no-cache ca-certificates
COPY --from=builder /app/sample-app /sample-app
CMD ["/sample-app"]

The sample also includes files under the collector subdirectory for building a custom OpenTelemetry Collector:

  • A config file for the OpenTelemetry Collector.

    receivers:
      otlp:
        protocols:
          grpc:
          http:
    
    processors:
      batch:
        # batch metrics before sending to reduce API usage
        send_batch_max_size: 200
        send_batch_size: 200
        timeout: 5s
    
      memory_limiter:
        # drop metrics if memory usage gets too high
        check_interval: 1s
        limit_percentage: 65
        spike_limit_percentage: 20
    
      # automatically detect Cloud Run resource metadata                                                                                                                                               
      resourcedetection:
        detectors: [env, gcp]
        timeout: 2s
        override: false
    
      resource:
        attributes:
          # add instance_id as a resource attribute                                                                                                                                                    
        - key: service.instance.id
          from_attribute: faas.id
          action: upsert
          # parse service name from K_SERVICE Cloud Run variable                                                                                                                                       
        - key: service.name
          value: ${env:K_SERVICE}
          action: insert
    
    exporters:
      googlemanagedprometheus: # Note: this is intentionally left blank   
    
    extensions:
      health_check:
    
    service:
      extensions: [health_check]
      pipelines:
        metrics:
          receivers: [otlp]
          processors: [batch, memory_limiter, resourcedetection, resource]
          exporters: [googlemanagedprometheus]
  • A Dockerfile that bundles the provided config into an upstream Collector image.

    FROM otel/opentelemetry-collector-contrib:0.75.0
    
    COPY collector-config.yaml /etc/otelcol-contrib/config.yaml

Shipping the code

Shipping code consists of three steps: building a container image with Cloud Build, uploading the container image to Artifact Registry, and deploying the container image to Cloud Run.

To ship your code:

  1. Build your sample service container and publish on Artifact Registry:

    gcloud builds submit --tag REGION-docker.pkg.dev/PROJECT_ID/run-otel/sample-metrics-app

    Upon success, you should see a SUCCESS message containing the ID, creation time, and image name. The image is stored in Artifact Registry and can be re-used if desired.

  2. Build your Collector container and publish on Artifact Registry:

    gcloud builds submit collector --tag REGION-docker.pkg.dev/PROJECT_ID/run-otel/otel-collector-metrics

    Upon success, you should see a SUCCESS message containing the ID, creation time, and image name. The image is stored in Artifact Registry and can be re-used if desired.

  3. Deploy your application:

    YAML

    1. Create a new file called service.yaml with the following:

      apiVersion: serving.knative.dev/v1
      kind: Service
      metadata:
        name: SERVICE-NAME
        annotations:
          run.googleapis.com/launch-stage: BETA
      spec:
        template:
          metadata:
            annotations:
              run.googleapis.com/container-dependencies: "{app:[collector]}"
          spec:
            containers:
            - image: REGION-docker.pkg.dev/PROJECT_ID/run-otel/sample-metrics-app
              name: app
              ports:
              - containerPort: CONTAINER_PORT
              env:
              - name: "OTEL_EXPORTER_OTLP_ENDPOINT"
                value: "http://localhost:4317"
            - image: REGION-docker.pkg.dev/PROJECT_ID/run-otel/otel-collector-metrics
              name: collector
              startupProbe:
                httpGet:
                  path: /
                  port: 13133
      
    2. Replace the following:
  4. Create the new service with the following command:

    gcloud run services replace service.yaml

    This command returns a service URL. Use this URL to try out the sample application in Trying it out.

Trying it out

Using the URL from the gcloud run command in Shipping the code, connect to the service to generate some sample metrics (you can run this command several times to generate more interesting data):

curl -H \
"Authorization: Bearer $(gcloud auth print-identity-token)" \
SERVICE_URL

Replace SERVICE_URL with the URL of your service.

Next, navigate to the Metrics Explorer within the Cloud Monitoring section of the Google Cloud console and select the sidecar_sample_counter metric.

Custom metric shown in the Metrics Explorer UI

You can also query the metrics with PromQL. For example, the query below will filter metrics based on the Cloud Run instance ID:

sidecar_sample_counter{instance="INSTANCE_ID"}

Replace INSTANCE_ID with the ID of any instance for your service (available in the instance logs or from the metadata server).

This query produces a chart like the one below:

Custom metric queried by PromQL

Clean up

If you created a new project for this tutorial, delete the project. If you used an existing project and wish to keep it without the changes added in this tutorial, delete resources created for the tutorial.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting tutorial resources

  1. Delete the Cloud Run service you deployed in this tutorial:

    gcloud run services delete SERVICE-NAME

    Where SERVICE-NAME is your chosen service name.

    You can also delete Cloud Run services from the Google Cloud console.

  2. Remove the gcloud default region configuration you added during tutorial setup:

     gcloud config unset run/region
    
  3. Remove the project configuration:

     gcloud config unset project
    
  4. Delete other Google Cloud resources created in this tutorial:

What's next

More examples, including examples for traces and logs, are available on GitHub.