Enabling distributed tracing

This page applies to Apigee and Apigee hybrid.

View Apigee Edge documentation.

This page demonstrates the steps required to configure distributed tracing for your Apigee runtime. If you are new to using distributed tracing systems and would like more information, see Understanding Distributed Tracing.

Introduction

Distributed tracing systems let you track a request in a software system distributed across multiple applications, services, and databases, as well as intermediaries like proxies. These tracing systems generate reports showing the time taken by a request at each step. Tracing reports can also provide a granular view of the various services called during a request, enabling a deeper understanding of what happens at each step in your software system.

The trace tool in Apigee Edge and the debug tool in Apigee are useful for troubleshooting and monitoring your API proxies. However, these tools do not send any data to distributed tracing servers like Cloud Trace or Jaeger. To view Apigee runtime data in a distributed tracing report, you must explicitly enable distributed tracing in your Apigee runtime. Once tracing is enabled, the runtime can send trace data to distributed tracing servers and participate in an existing trace. As a result, you can view data from inside and outside of your Apigee ecosystem from a single location.

You can view the following information in distributed tracing reports:

  • Execution time of an entire flow.
  • Time at which the request is received.
  • Time at which the request is sent to the target.
  • Time at which the response is received from the target.
  • Execution time of each policy in a flow.
  • Execution time of service callouts and target flows.
  • Time at which the response is sent to the client.

In the distributed tracing report, you can view the execution details of the flows as spans. A span refers to the time taken by a flow in a trace. The time taken to execute a flow is displayed as an aggregate of the time required to execute each policy in the flow. You can view each of the following flows as individual spans:

  • Request
    • Proxy
      • Preflow
      • PostFlow
    • Target
      • Preflow
      • PostFlow
  • Response
    • Proxy
      • Preflow
      • PostFlow
    • Target
      • Preflow
      • PostFlow

Once you enable distributed tracing, the Apigee runtime will trace a set of predefined variables by default. For more information, see Default trace variables in tracing report. You can use the TraceCapture policy to extend the default runtime behavior and trace additional flow, policy, or custom variables. For more information, see the TraceCapture policy.

Default trace variables in tracing report

Once distributed tracing is enabled, you can view the following set of pre-defined variables in the tracing report. The variables are visible in the following spans:

  • POST_RESP_SENT: This span is added after a response is received from the target server.
  • POST_CLIENT_RESP_SENT: This span is added after the proxy response is sent to the client.

Variables in POST_RESP_SENT span

The following variables are visible in the POST_RESP_SENT span:
  • REQUEST_URL (request.url)
  • REQUEST_VERB (request.verb)
  • RESPONSE_STATUS_CODE (response.status.code)
  • ROUTE_NAME (route.name)
  • ROUTE_TARGET (route.target)
  • TARGET_BASE_PATH (target.basepath)
  • TARGET_HOST (target.host)
  • TARGET_IP (target.ip)
  • TARGET_NAME (target.name)
  • TARGET_PORT (target.port)
  • TARGET_RECEIVED_END_TIMESTAMP (target.received.end.timestamp)
  • TARGET_RECEIVED_START_TIMESTAMP (target.received.start.timestamp)
  • TARGET_SENT_END_TIMESTAMP (target.sent.end.timestamp)
  • TARGET_SENT_START_TIMESTAMP (target.sent.start.timestamp)
  • TARGET_SSL_ENABLED (target.ssl.enabled)
  • TARGET_URL (target.url)

Variables in POST_CLIENT_RESP_SENT span

The following variables are visible in the POST_CLIENT_RESP_SENT span:
  • API_PROXY_REVISION (apiproxy.revision)
  • APIPROXY_NAME (apiproxy.name)
  • CLIENT_RECEIVED_END_TIMESTAMP (client.received.end.timestamp)
  • CLIENT_RECEIVED_START_TIMESTAMP (client.received.start.timestamp)
  • CLIENT_SENT_END_TIMESTAMP (client.sent.end.timestamp)
  • CLIENT_SENT_START_TIMESTAMP (client.sent.start.timestamp)
  • ENVIRONMENT_NAME (environment.name)
  • FAULT_SOURCE (message.header + InternalHeaders.FAULT_SOURCE)
  • IS_ERROR (is.error)
  • MESSAGE_ID (message.id)
  • MESSAGE_STATUS_CODE (message.status.code)
  • PROXY_BASE_PATH (proxy.basepath)
  • PROXY_CLIENT_IP (proxy.client.ip)
  • PROXY_NAME (proxy.name)
  • PROXY_PATH_SUFFIX (proxy.pathsuffix)
  • PROXY_URL (proxy.url)

Supported distributed tracing systems

The Apigee runtime supports the following distributed tracing systems:

  • Cloud Trace
  • Jaeger

You can configure your Apigee runtime to send trace data to either a Cloud Trace or a Jaeger system.

Because tracing all the API calls in Apigee's runtime would impact performance, Apigee lets you configure a probabilistic sampling rate. By using the sampling rate, you can specify the number of API calls that are sent for distributed tracing. For example, if you specify the sampling rate as 0.4, it means that 40% of the API calls are sent for tracing. For more information, see Performance considerations.

Configure Apigee runtimes for Cloud Trace

Both the Apigee runtime and the Apigee hybrid runtime support distributed tracing using Cloud Trace. If you are using Jaeger, you can skip this section and proceed to Enabling distributed tracing for Jaeger.

Configure Apigee runtime for Cloud Trace

To use Cloud Trace with an Apigee runtime, your Google Cloud project must have the Cloud Trace API enabled. This setting lets your Google Cloud project receive trace data from authenticated sources.

To confirm that the Cloud Trace API is enabled do the following:

  1. From the Google Cloud console, go to APIs and Services:

    Go to APIs and Services

  2. Click Enable APIs and Services.
  3. In the search bar, enter Trace API.
  4. If API enabled is displayed, this API is already enabled and there is nothing for you to do. Otherwise, click Enable.

Configure Apigee hybrid runtime for Cloud Trace

The Cloud Trace API must be enabled to use Cloud Trace with an Apigee hybrid runtime. To confirm that the Cloud Trace API is enabled, follow the steps in Configure Apigee runtime for Cloud Trace.

In addition to enabling the Cloud Trace API, you must add the iam.gserviceaccount.com service account to use Cloud Trace with the hybrid runtime. To add the service account, along with the require roles and keys, perform the following steps:

  1. Create a new service account:
    gcloud iam service-accounts create \
        apigee-runtime --display-name "Service Account Apigee hybrid runtime" \
        --project PROJECT_ID
  2. Add an IAM policy binding to the service account:
    gcloud projects add-iam-policy-binding \
        PROJECT_ID --member "serviceAccount:apigee-runtime@PROJECT_ID.iam.gserviceaccount.com" \
        --role=roles/cloudtrace.agent --project PROJECT_ID
  3. Create a service account key:
    gcloud iam service-accounts keys \
        create ~/apigee-runtime.json --iam-account apigee-runtime@PROJECT_ID.iam.gserviceaccount.com
  4. Add the service account to the overrides.yaml file.
  5. envs:
     - name: ENV_NAME
       serviceAccountPaths:
       runtime: apigee-runtime.json
       synchronizer: apigee-sync.json
       udca: apigee-udca.json
  6. Apply the changes to the runtime
  7. apigeectl apply -f overrides.yaml --env=ENV_NAME

Enable distributed tracing

Before enabling distributed tracing for Cloud Trace or Jaeger, create the following environment variables:

TOKEN="Authorization: Bearer $(gcloud auth print-access-token)"
ENV_NAME=YOUR_ENVIRONMENT_NAME
PROJECT_ID=YOUR_GOOGLE_CLOUD_PROJECT_ID

Where:

  • TOKEN defines the Authentication header with a bearer token. You use this header when calling Apigee APIs. For more information, see the reference page for the print-access-token command.
  • ENV_NAME is the name of an environment in your organization.
  • PROJECT_ID is the ID of your Google Cloud project.

Enable distributed tracing for Cloud Trace

The following example shows you how to enable the distributed tracing for Cloud Trace:

  1. Execute this Apigee API call:
    curl -H "$TOKEN" \
        -H "Content-Type: application/json" \
        https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig \
        -X PATCH \
        -d '{"exporter":"CLOUD_TRACE","endpoint": "'"$PROJECT_ID"'",
        "samplingConfig": {"sampler": "PROBABILITY","samplingRate": 0.1}}'

    The example request body consists of the following elements:

    • The samplingRate is set to 0.1. This means approximately 10% of the API calls are sent to distributed tracing. For more information on setting a samplingRate for your runtime environment, see Performance considerations.
    • The exporter parameter is set to CLOUD_TRACE.
    • The endpoint is set to the Google Cloud project where you want the trace to be sent. NOTE: This must match the service account that was done in the configuration step.

    A successful response looks similar to the following:

    {
      "exporter": "CLOUD_TRACE",
      "endpoint": "staging",
      "samplingConfig": {
        "sampler": "PROBABILITY",
        "samplingRate": 0.1
      }
    }

Enable distributed tracing for Jaeger

The following example shows you how to enable the distributed tracing for Jaeger:

curl -s -H "$TOKEN" \
    'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig' \
    -X PATCH \
    -H "content-type:application/json" -d '{
    "samplingConfig": {
    "samplingRate": 0.4,
    "sampler": "PROBABILITY"},
    "endpoint": "http://DOMAIN:9411/api/v2/spans",
    "exporter": "JAEGER"
    }'

In this example:

  • The samplingRate is set to 0.4. This means approximately 40% of the API calls are sent to distributed tracing.
  • The exporter parameter is set to JAEGER.
  • The endpoint is set to where Jaeger is installed and configured.

When you run the command, you can see a response similar to the following:

{
  "exporter": "JAEGER",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.4
  }
}

View the distributed tracing configuration

To view the existing distributed tracing configuration in your runtime, log in to your runtime and then run the following command:

curl -H "$TOKEN" \
    -H "Content-Type: application/json" \
    https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.1
  }
}

Update the distributed tracing configuration

The following command shows you how to update the existing distributed tracing configuration for Cloud Trace:

curl -s \
    -H "$TOKEN" \
    'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig?updateMask=endpoint,samplingConfig,exporter' \
    -X PATCH -H "content-type:application/json" \
    -d '{"samplingConfig": {"samplingRate": 0.05, "sampler":"PROBABILITY"},
    "endpoint":"staging", exporter:"CLOUD_TRACE"}'

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.05
  }
}
In this example, the sampling rate is updated to 0.05.

Disable the distributed tracing configuration

The following example shows how to disable distributed tracing configured for Cloud Trace:

curl -H "$TOKEN" \
    -H "Content-Type: application/json" \
    https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig \
    -X PATCH -d '{"exporter": "CLOUD_TRACE","endpoint": "'"$PROJECT_ID"'","samplingConfig":
    {"sampler": "OFF","samplingRate": 0.1}}'

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "OFF",
    "samplingRate": 0.1
  }
}

Override trace settings for API proxies

When you enable distributed tracing in your Apigee runtime, all the API proxies in the runtime use the same configuration for tracing. However, you can override the distributed tracing configuration for an API proxy or a group of API proxies. This provides you more granular control over the tracing configuration.

The following example overrides the distributed tracing configuration for the hello-world API proxy:

curl -s -H "$TOKEN" \
     'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/ENV_NAME/traceConfig/overrides' \
     -X POST \
     -H "content-type:application/json" \
     -d '{"apiProxy": "hello-world","samplingConfig": {"sampler": "PROBABILITY","samplingRate": 0.1}}'

You can override the configuration to troubleshoot problems specific to an API proxy without having to change the configuration of all the API proxies.

Update trace settings overrides

To update an override of the tracing configuration for an API proxy or group of API proxies, use the following steps:

  1. Use the following command to retrieve any existing overrides of the tracing configuration:
    curl -s -H "$TOKEN" \
        'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides' \
        -X GET 

    This command should return a response similar to the following, which contains a "name" field that identifies the proxy or proxies governed by the override:

    {
      "traceConfigOverrides": [
        {
          "name": "dc8437ea-4faa-4b57-a14f-4b8d3a15fec1",
          "apiProxy": "proxy1",
          "samplingConfig": {
            "sampler": "PROBABILITY",
            "samplingRate": 0.25
          }
        }
      ]
    }
  2. To update the proxy, use the value of the "name" field to send a POST request to the override configuration for that proxy,along with the updated field values. For example:
    curl -s -H "$TOKEN" \
        'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides/dc8437ea-4faa-4b57-a14f-4b8d3a15fec1' \
        -X POST \
        -H "content-type:application/json" \
        -d '{"apiProxy": "proxy1","samplingConfig": {"sampler": "PROBABILITY","samplingRate": 0.05}}'

Delete trace setting overrides

To delete an override of the tracing configuration for an API proxy or group of API proxies, use the following steps:

  1. Use the following command to retrieve any existing overrides of the tracing configuration:
    curl -s -H "$TOKEN" \
        'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides' \
        -X GET 

    This command should return a response similar to the following, which contains a "name" field that identifies the proxy or proxies governed by the override:

    {
      "traceConfigOverrides": [
        {
          "name": "dc8437ea-4faa-4b57-a14f-4b8d3a15fec1",
          "apiProxy": "proxy1",
          "samplingConfig": {
            "sampler": "PROBABILITY",
            "samplingRate": 0.25
          }
        }
      ]
    }
  2. To delete the proxy, use the value of the "name" field to send a DELETE request to the override configuration for that proxy,along with the updated field values. For example:
    curl -s -H "$TOKEN" \
        'https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides/dc8437ea-4faa-4b57-a14f-4b8d3a15fec1' \
        -X DELETE \

Performance considerations

A performance impact is expected when you enable distributed tracing for an Apigee runtime environment. The impact can result in increased memory usage, increased CPU requirements, and increased latency. The magnitude of the impact will depend in part upon the complexity of the API proxy (for example, the number of policies) and the probabilistic sampling rate (set as the samplingRate). The higher the sampling rate, the higher the impact on performance. Although the impact on performance depends on a number of factors, but you can expect a 10-20% drop in performance when using distributed tracing.

For environments with high traffic and low latency requirements, the recommended probabilistic sampling rate is less than or equal to 10%. If you want to use distributed tracing to troubleshoot, consider increasing the probabilistic sampling (samplingRate) only for specific API proxies.