Enabling distributed tracing

You're viewing Apigee X documentation.
View Apigee Edge documentation.

This document shows you how to configure distributed tracing for your Apigee runtime. It assumes you have a basic understanding of how distributed tracing works. However, if you are new to distributed tracing and are looking for pointers to know more, you can start off by reading the information available at What is Distributed Tracing?

Introduction

Distributed tracing systems let you track a request in a software system distributed across multiple applications, services, and databases, as well as intermediaries like proxies. These tracing systems generate reports that show you the time taken by a request at each step. The reports also show you the various services called during a request at a granular level. By viewing the reports, you can get a deeper understanding of what is happening within your software system.

The trace tool in Apigee Edge and the debug tool in the Apigee X lets you troubleshoot and monitor your API proxies only. The trace and debug tools do not send any data to your distributed tracing servers like Cloud Trace or Jaeger. To view the Apigee runtime data in your distributed tracing report, you must explicitly enable distributed tracing in your Apigee runtime. After you enable tracing, the runtime sends the trace data to your distributed tracing servers. The Apigee runtime can also participate in an existing trace. As a result, you view data from both within and outside your Apigee ecosystem in a single location.

You can view the following information in your distributed tracing reports:

  • Execution time of an entire flow.
  • Time at which the request is received.
  • Time at which the request is sent to the target.
  • Time at which the response is received from the target.
  • Execution time of each policy in a flow.
  • Execution time of service callouts and target flows.
  • Time at which the response is sent to the client.

In your distributed tracing report, you can view the execution details of the flows as spans. A span refers to the time taken by a flow in a trace. The time taken to execute a flow is displayed as an aggregate of the time required to execute each policy in the flow. You can view each of the following flows as individual spans:

  • Request
    • Proxy
      • Preflow
      • PostFlow
    • Target
      • Preflow
      • PostFlow
  • Response
    • Proxy
      • Preflow
      • PostFlow
    • Target
      • Preflow
      • PostFlow

If you have enabled distributed tracing for the Apigee runtime, the runtime, by default, traces a set of predefined variables. For more information, see Default trace variables in tracing report. However, if you want the Apigee runtime to trace additional flow, policy, or custom variables, use the TraceCapture policy. For more information, see the TraceCapture policy.

Default trace variables in tracing report

If you have enabled distributed tracing, even without using the TraceCapture policy, you can view a set of pre-defined variables in the tracing report. The variables are visible in the following spans:

  • POST_RESP_SENT: This span is added after a response is received from the target server.
  • POST_CLEINT_RESP_SENT: This span is added after the proxy response is sent to the client.

Variables in POST_RESP_SENT span

The following variables are visible in the POST_RESP_SENT span:
  • REQUEST_URL (request.url)
  • REQUEST_VERB (request.verb)
  • RESPONSE_STATUS_CODE (response.status.code)
  • ROUTE_NAME (route.name)
  • ROUTE_TARGET (route.target)
  • TARGET_BASE_PATH (target.basepath)
  • TARGET_HOST (target.host)
  • TARGET_IP (target.ip)
  • TARGET_NAME (target.name)
  • TARGET_PORT (target.port)
  • TARGET_RECEIVED_END_TIMESTAMP (target.received.end.timestamp)
  • TARGET_RECEIVED_START_TIMESTAMP (target.received.start.timestamp)
  • TARGET_SENT_END_TIMESTAMP (target.sent.end.timestamp)
  • TARGET_SENT_START_TIMESTAMP (target.sent.start.timestamp)
  • TARGET_SSL_ENABLED (target.ssl.enabled)
  • TARGET_URL (target.url)

Variables in POST_CLEINT_RESP_SENT span

The following variables are visible in the POST_CLEINT_RESP_SENT span:
  • API_PROXY_REVISION (apiproxy.revision)
  • APIPROXY_NAME (apiproxy.name)
  • CLIENT_RECEIVED_END_TIMESTAMP (client.received.end.timestamp)
  • CLIENT_RECEIVED_START_TIMESTAMP (client.received.start.timestamp)
  • CLIENT_SENT_END_TIMESTAMP (client.sent.end.timestamp)
  • CLIENT_SENT_START_TIMESTAMP (client.sent.start.timestamp)
  • ENVIRONMENT_NAME (environment.name)
  • FAULT_SOURCE (message.header + InternalHeaders.FAULT_SOURCE)
  • IS_ERROR (is.error)
  • MESSAGE_ID (message.id)
  • MESSAGE_STATUS_CODE (message.status.code)
  • PROXY_BASE_PATH (proxy.basepath)
  • PROXY_CLIENT_IP (proxy.client.ip)
  • PROXY_NAME (proxy.name)
  • PROXY_PATH_SUFFIX (proxy.pathsuffix)
  • PROXY_URL (proxy.url)

Supported distributed tracing systems

The Apigee runtime supports the following distributed tracing systems:

  • Cloud Trace
  • Jaeger

You can configure your Apigee runtime to send the trace data to either a Cloud Trace or a Jaeger system.

Tracing all the API calls in Apigee's runtime, impacts the performance. Hence, Apigee lets you configure a probabilisitc sampling rate. By using the sampling rate, you can limit the number of API calls that are sent for distributed tracing. For example, if you specify the sampling rate as 0.4, it means that 40% of the API calls are sent for tracing. For more information, see Performance considerations.

Configuring Apigee runtime

Both the Apigee X runtime and the Apigee Hybrid runtime support distributed tracing. If you are using Jaeger, you can skip this section. Jaeger does not require any additional configuration.

Configuring Apigee X runtime for Cloud Trace

For the Apigee X runtime, you must enable the Cloud Trace API in your Google Cloud project. After enabling the API, you can enable distributed tracing for Cloud Trace. For more information, see Enabling distributed tracing for Cloud Trace.

Configuring Apigee Hybrid runtime for Cloud Trace

For the Apigee Hybrid runtime, in addition to enabling the Cloud Trace API, you must add the iam.gserviceaccount.com service account for the runtime. To add the service account, perform the following steps:

  1. Create a new service account.
  2. gcloud iam service-accounts create
      apigee-runtime --display-name "Service Account Apigee hybrid runtime" --project ${GOOGLE_CLOUD_PROJECT_ID}
    gcloud projects add-iam-policy-binding
      ${GOOGLE_CLOUD_PROJECT_ID} --member "serviceAccount:apigee-runtime@${GOOGLE_CLOUD_PROJECT_ID}.iam.gserviceaccount.com"
      --role=roles/cloudtrace.agent --project ${GOOGLE_CLOUD_PROJECT_ID}
    gcloud iam service-accounts keys
      create ~/apigee-runtime.json --iam-account apigee-runtime@${GOOGLE_CLOUD_PROJECT_ID}.iam.gserviceaccount.com
  3. Add the service account to the overrides.yaml file.
  4. envs:
     - name: ENV_NAME
       serviceAccountPaths:
       runtime: apigee-runtime.json
       synchronizer: apigee-sync.json
       udca: apigee-udca.json
    
  5. Apply the changes to the runtime
  6. apigeectl apply -f overrides.yaml --env=$ENV_NAME

Enabling distributed tracing for Cloud Trace

To enable distributed tracing for a Cloud Trace system, login to your runtime and issue a PATCH request to the following API:

https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig

The following example shows you how to enable the distributed tracing for Cloud Trace:

curl -H "$AUTH"
    -H "Content-Type: application/json"
    https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig
    -X PATCH
    -d '{"exporter":"CLOUD_TRACE","endpoint": "$GOOGLE_CLOUD_PROJECT_ID",
    "samplingConfig": {"sampler": "PROBABILITY","samplingRate": 0.5}}'

In this example:

  • The sampleRate is set to 0.5. This means approximately 50% of the API calls are sent to distributed tracing.
  • The exporter parameter is set to CLOUD_TRACE.
  • The endpoint is set to the Google Cloud project where you want the trace to be sent. NOTE: This must match the service account that was done in the configuration step.

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.5
  }
}

Enabling distributed tracing for Jaeger

To enable distributed tracing for a Jaeger system, login to your runtime and issue a PATCH request to the following API:

https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig

The following example shows you how to enable the distributed tracing for Jaeger:

curl -s -H $AUTH
    'https: //apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig'
    -X PATCH \
    -H "content-type:application/json" -d '{
    "samplingConfig": {
    "samplingRate": 0.4,
    "sampler": "PROBABILITY"},
    "endpoint": "http://$DOMAIN:9411/api/v2/spans",
    "exporter": "JAEGER"
    }'

In this example:

  • The sampleRate is set to 0.4. This means approximately 40% of the API calls are sent to distributed tracing.
  • The exporter parameter is set to JAEGER.
  • The endpoint is set to where Jaeger is installed and configured.

When you run the command, you can see a response similar to the following:

{
  "exporter": "JAEGER",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.4
  }
}

Viewing distributed tracing configuration

To view the existing distributed tracing configuration in your runtime, login to your runtime and then run the following command:

curl -H "$AUTH"
    -H "Content-Type: application/json"
    https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.5
  }
}

Updating distributed tracing configuration

To update the distributed tracing configuration, login to your runtime and issue a PATCH request to the following API:

https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig?updateMask=endpoint,samplingConfig,exporter

The following command shows you how to update the existing distributed tracing configuration for Cloud Trace:

curl -s
  -H "$AUTH"
  'https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig?updateMask=endpoint,samplingConfig,exporter'
  -X PATCH -H "content-type:application/json"
  -d '{"samplingConfig": {"samplingRate": 0.3, "sampler":"PROBABILITY"},
  "endpoint":"staging", exporter:"CLOUD_TRACE"}'

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "PROBABILITY",
    "samplingRate": 0.3
  }
}
In this example, the sampling rate is updated to 0.3.

Disabling distributed tracing configuration

To disable the distributed tracing configuration, login to your runtime and issue a PATCH request to the following API:

https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig

The following example shows you how to disable distributed tracing configured for Cloud Trace:

curl -H "$AUTH"
  -H "Content-Type: application/json"
  https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig
  -X PATCH -d '{"exporter": "CLOUD_TRACE","endpoint": "$GOOGLE_CLOUD_PROJECT_ID","samplingConfig":
  {"sampler": "OFF","samplingRate": 0.5}}'

When you run the command, you can see a response similar to the following:

{
  "exporter": "CLOUD_TRACE",
  "endpoint": "staging",
  "samplingConfig": {
    "sampler": "OFF",
    "samplingRate": 0.5
  }
}

Overriding trace settings for API proxies

When you enable distributed tracing in your Apigee runtime, all the API proxies in the runtime use the same configuration for tracing. However, you can override the distributed tracing configuration for an API proxy or a group of API proxies. This provides you more granular control over the tracing configuration.

To override the configuration for an API proxy, issue a POST request to the following API:

https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides

The following example overrides the distributed tracing configuration for the hello-world API proxy:

curl -s -H $AUTH
'https://apigee.googleapis.com/v1/organizations/$GOOGLE_CLOUD_PROJECT_ID/environments/$ENV_NAME/traceConfig/overrides'
 -X POST \
 -H "content-type:application/json"
 -d '{"apiProxy": "hello-world","samplingConfig": {"sampler": "PROBABILITY","samplingRate": 0.1}}'

You can override the configuration to troubleshoot problems specific to an API proxy without having to change the configuration of all the API proxies.

Performance considerations

There is a performance impact when you enable distributed tracing in an Apigee runtime environment. That impact is proportional to the complexity of the API proxy (number of policies) and the probabilistic sampling rate. The higher the sampling rate, the higher the impact on performance (memory, CPU, and increased latency). The exact magnitude of the impact depends on various factors, but you can expect a 10%-20% drop in the performance.

For environments with high traffic and have low latency requirements, the recommended probabilistic sampling rate is less than equal to 10%. If you want to troubleshoot, consider increasing the probabilistic sampling only for specific API proxies.