Best practices for Google Cloud Apigee support cases

You're viewing Apigee and Apigee hybrid documentation.
View Apigee Edge documentation.

Providing detailed and required information in the support case makes it easier for the Google Cloud Support team to respond to you quickly and efficiently. When your support case is missing critical details, we need to ask for more information, which may involve going back and forth multiple times. This takes more time and can lead to delays in resolution of issues. This Best Practices Guide lets you know the information we need to resolve your technical support case faster.

Describing the issue

An issue should contain information explaining the details about what happened versus what was expected to happen, as well as when and how it happened. A good support case should contain the following key information for each of the Apigee products:

Key information Description Apigee on Google Cloud Apigee hybrid
Product Specific Apigee product in which the problem is being observed, including version information where applicable.
  • Hybrid version
Problem Details Clear and detailed problem description which outlines the issue, including the complete error message, if any.
  • Error message
  • Debug tool output
  • Steps to reproduce problem
  • Complete API request/command
  • Error message
  • Debug tool output
  • Steps to reproduce problem
  • Complete API request/command
  • Component diagnostic logs
  • Cloud Monitoring metrics
Time The specific timestamp when the issue started and how long it lasted.
  • Date, time, and timezone of problem occurrence
  • Duration of the problem
  • Date, time, and timezone of problem occurrence
  • Duration of the problem
Setup Detailed information where the problem is being observed.
  • Org name
  • Env name
  • API proxy name
  • Revision

The following sections describe these concepts in greater detail.

Product

There are different Apigee products, Apigee on Google Cloud and Apigee hybrid, so we need specific information about which particular product is having the issue.

The following table provides some examples showing complete information in the DOs column, and incomplete information in the DON'Ts column:

DOs DON'Ts
Deployment of API proxy OAuth2 failed on our Apigee on Google Cloud org ...

Deployment of API proxy failed

(We need to know Apigee product in which you are seeing the issue.)

We are getting the following error while accessing Cassandra using cqlsh on Apigee hybrid version 1.3 ...

We are unable to access Cassandra using cqlsh.

(Hybrid version information is missing)

Problem Details

Provide the precise information about the issue being observed including the error message (if any) and expected and actual behaviour observed.

The following table provides some examples showing complete information in the DOs column, and incomplete information in the DON'Ts column:

DOs DON'Ts

New edgemicro proxy edgemicro_auth is failing with the following error:

{"error":"missing_authorization","error_description":"Missing Authorization header"}

New edgemicro proxy created today not working

(The proxy name is unknown. It is not clear whether the proxy is returning an error or any unexpected response.)

Our clients are getting 500 errors with the following error message while making requests to API proxy:

{"fault":{"faultstring":"Execution of JSReadResponse failed with error: Javascript runtime error: \"TypeError: Cannot read property \"content\" from undefined. (JSReadResponse.js:23)","detail":{"errorcode":"steps.javascript.ScriptExecutionFailed"}}}

Our clients are getting 500 Errors while making requests to API proxy.

(Just conveying 500 Errors doesn't provide adequate information for us to investigate the issue. We need to know the actual error message and error code that is being observed.)

Time

Time is a very critical piece of information. It is important for the Support Engineer to know when you first noticed this issue, how long it lasted, and if the issue is still going on.

The Support Engineer resolving the issue may not be in your timezone, so relative statements about time make the problem harder to diagnose. Hence, it is recommended to use the ISO 8601 format for the date and time stamp to provide the exact time information on when the issue was observed.

The following table provides some examples showing accurate time and duration for which the problem occurred in the DOs column, and ambiguous or unclear information on when the problem occurred in the DON'Ts column:

DOs DON'Ts
Huge number of 503s were observed yesterday between 2020-11-06 17:30 PDT and 2020-11-06 17:35 PDT...

Huge number of 503s were observed yesterday at 5:30pm for 5 mins.

(We are forced to use the implied date and it is also unclear in which timezone this issue was observed.)

High latencies were observed on the following API Proxies from 2020-11-09 15:30 IST to 2020-11-09 18:10 IST ...

High latencies were observed on some API Proxies last week.

(It is unclear which day and duration this issue was observed in the last week.)

Setup

We need to know the details about where exactly you are seeing the issue. Depending on the product you are using, we need the following information:

  • If you are using Apigee on Google Cloud, you may have more than one organization, so we need to know the specific org and other details where you are observing the issue:
    • Organization and Environment names
    • API proxy name and revision numbers (for API request failures)
  • If you are using hybrid, you may be using one of the many supported hybrid platforms and installation topologies. So we need to know what hybrid platform and topology you are using, including the details such as number of data centres and nodes.

The following table provides some examples showing complete information in the DOs column, and incomplete information in the DON'Ts column:

DOs DON'Ts

401 Errors have increased on Apigee on Google Cloud since 2020-11-06 09:30 CST.

Apigee setup details:

Details of the failing API are as follows:
  Org names: myorg
  Env names: test
  API proxy names: myproxy
  Revision numbers: 3

Error:

{"fault":{"faultstring":"Failed to resolve API Key variable request.header.X-APP-API_KEY","detail":{"errorcode":"steps.oauth.v2.FailedToResolveAPIKey"}}}

401 Errors have increased.

(It does not give any information on the product being used, since when the issue is being observed or any setup details.)

Debug is failing with the following error on Apigee hybrid version 1.3

Error:

Error while Creating trace session for corp-apigwy-discovery, revision 3, environment dev.

Failed to create DebugSession {apigee-hybrid-123456 dev corp-apigwy-discovery 3 ca37384e-d3f4-4971-9adb-dcc36c392bb1}

Apigee hybrid setup details:

  • Apigee hybrid platform:
      Anthos GKE on-prem version 1.4.0
  • Google Cloud project, hybrid organization and environment
      Google Cloud Project ID: apigee-hybrid-123456
      Apigee hybrid org: apigee-hybrid-123456
      Apigee hybrid env: dev
  • Kubernetes cluster name details
      k8sCluster:
      name: user-cluster-1
      region: us-east1
  • Network topology
    Attached the file network-topology.png.
Debug is failing on Apigee hybrid.

Useful artifacts

Providing us with artifacts related to the issue will speed up the resolution, as it helps us understand the exact behavior you are observing and get more insights into it.

This section describes some useful artifacts that are helpful for all Apigee products:

Common artifacts for all Apigee products

The following artifacts are useful for all Apigee products: Apigee on Google Cloud and Apigee hybrid:

Artifact Description
Debug tool output The Debug tool output contains detailed information about the API requests flowing through Apigee products. This is useful for any runtime errors such as 4XX, 5XX, and latency issues.
Screenshots Screenshots help relay the context of the actual behaviour or error being observed. It is helpful for any errors or issues observed, such as in the UI or Analytics.
HAR (Http ARchive) HAR is a file that is captured by HTTP session tools for debugging any UI related issues. This can be captured using browsers such as Chrome, Firefox, or Internet Explorer.
tcpdumps The tcpdump tool captures TCP/IP packets transferred or received over the network. This is useful for any network-related issues such as TLS handshake failures, 502 errors, and latency issues, etc.

Additional artifacts for hybrid

For hybrid, we may need some additional artifacts that will facilitate faster diagnosis of issues.

Artifact Description
Apigee hybrid platform Specify any of the following supported hybrid platforms that are used:
  • GKE
  • GKE on-prem
  • AKS (Azure Kubernetes Service)
  • Amazon EKS
  • GKE on AWS
Apigee hybrid and dependent component versions
  • Apigee hybrid CLI version:
    apigeectl version
  • Apigee Connect Agent version:
    kubectl -n=apigee get pods -l app=apigee-connect-agent -o=json | jq '.items[].spec.containers[].image'
  • Apigee MART version:
    kubectl -n=apigee get pods -l app=apigee-mart -o=json | jq '.items[].spec.containers[].image'
  • Apigee Synchronizer version:
    kubectl -n=apigee get pods -l app=apigee-synchronizer -o=json | jq '.items[].spec.containers[].image'
  • Apigee Cassandra version:
    kubectl -n=apigee get pods -l app=apigee-cassandra -o=json | jq '.items[].spec.containers[].image'
  • Apigee Runtime version:
    kubectl -n=apigee get pods -l app=apigee-runtime -o=json | jq '.items[].spec.containers[].image'
  • Kubernetes CLI and server versions:
    kubectl version
  • Istio CLI and server versions:
    istioctl version
Network topology The Apigee installation topology diagram describing your hybrid setup including all the data centers, Kubernetes clusters, namespaces, and pods.
Overrides YAML File The overrides.yaml file used in each data center for installing Apigee hybrid runtime plane.
Status of Apigee hybrid deployment

The output of the following commands in each data center/Kubernetes cluster:

kubectl get pods -A
kubectl get services -A

Apigee hybrid component logs

Provide links to StackDriver logs for the hybrid components OR

You can fetch the Apigee hybrid component logs using the following commands in each data center/Kubernetes cluster and share them with us:

kubectl -n {namespace} get pods
kubectl -n {namespace} logs {pod-name}

  • Apigee Connect Agent logs:
    kubectl -n {namespace} get pods
    kubectl -n {namespace} logs {apigee-connect-agent-pod-name}
  • MART logs:
    kubectl -n {namespace} get pods
    kubectl -n {namespace} logs {apigee-mart-pod-name}
  • Synchronizer logs:
    kubectl -n {namespace} get pods
    kubectl -n {namespace} logs {synchronizer-pod-name}
  • Apigee Cassandra logs:
    kubectl -n {namespace} get pods
    kubectl -n {namespace} logs {apigee-cassandra-pod-name}
  • MP/Apigee Runtime logs (of all apigee-runtime pods):
    kubectl -n {namespace} get pods
    kubectl -n {namespace} logs {apigee-runtime-pod-name}
Describe logs

Detailed information about the pod.

This is useful especially if you are observing issues such as pods getting stuck in the CrashLoopBackoff state.

kubectl -n apigee describe pod {pod-name}

Cloud Monitoring
  • Link to your metrics Dashboard
  • Snapshots of any dashboards related to Cloud Monitoring Metrics.

Case templates and sample cases

This section provides case templates and sample cases for different products based on the best practices described in this document:

Apigee Cloud

Template

This section provides a sample template for Apigee on Google Cloud.

Problem:

<Provide detailed description of the problem or the behaviour being observed at your end. Include the product name and version where applicable.>

Error message:

<Include the complete error message observed (if any)>

Problem start time (ISO 8601 format):

Problem end time (ISO 8601 format):

Apigee setup details:
  Org names:
  Env names:
  API proxy names:
  Revision numbers:

Steps to reproduce:

<Provide steps to reproduce the issue where possible>

Diagnostic information:

<List of files attached>

Sample case

This section provides a sample case for Apigee on Google Cloud.

Problem:

We are seeing a high number of 503 Service Unavailable errors in our Public Cloud org. Can you please look into the issue and resolve it or advise us how to resolve it?

Error message:

{"fault":{"faultstring":"The Service is temporarily available", "detail":{"errorcode":"messaging.adaptors.http.flow.ServiceUnavailable"}}}

Problem start time (ISO 8601 format): 2020-10-04 06:30 IST

Problem end time (ISO 8601 format): The issue is still happening.

Apigee Cloud setup details:
  Org names: myorg
  Env names: dev
  API proxy names: myproxy
  Revision numbers: 3

Steps to reproduce:

Run the following curl command to reproduce the issue:

curl -X GET 'https://myorg-dev.apigee.net/v1/myproxy'

Diagnostic information:

Debug tool output (trace-503.xml)

Hybrid

Template

This section provides a sample template for Apigee hybrid.

Problem:

<Provide detailed description of the problem or the behaviour being observed at your end. Include the product name and version where applicable.>

Error message:

<Include the complete error message observed (if any)>

Problem start time (ISO 8601 format):

Problem end time (ISO 8601 format):

Apigee hybrid setup details:

  • Apigee hybrid platform:

    <Provide the information about the Platform where you have installed hybrid and its version.>

  • Google Cloud project, hybrid Organization and Environment:
      Google Cloud project ID:
      <If you are using Google Kubernetes Engine (GKE), ensure you provide the project ID where the clusters are located. If you are using GKE on-prem, Azure Kubernetes Service or Amazon EKS, then provide the project ID where you are sending the logs.>
      Apigee hybrid org:
      Apigee hybrid env:
  • Apigee hybrid and other CLI versions:
      Apigee hybrid CLI (apigeectl) version:
      Kubectl version:
  • Kubernetes cluster name details:
      k8sCluster:
      name:
      region:
  • Network topology:
    <Attach the network topology describing the setup of your Apigee hybrid including data centers, Kubernetes clusters, namespaces, and pods.>
  • Overrides YAML File:
    <Attach the Overrides YAML file.>

Steps to reproduce

<Provide steps to reproduce the issue where possible>

Diagnostic information:

<List of files attached>

Sample case

This section provides a sample case for Apigee hybrid.

Problem:

We are getting errors when executing management APIs on Apigee hybrid version 1.3.

Error message:

[ERROR] 400 Bad Request
{
"error": {
"code": 400,
"message": "Error processing MART request: INTERNAL_ERROR",
"errors": [
{
"message": "Error processing MART request: INTERNAL_ERROR",
"domain": "global",
"reason": "failedPrecondition"
}
],
"status": "FAILED_PRECONDITION"
}
}

Problem start time (ISO 8601 format): Since 2020-10-24 10:30 PDT

Problem end time (ISO 8601 format): Continuing to observe the issue.

Apigee hybrid setup details:

  • Apigee hybrid platform
    GKE version 1.15.1
  • Google Cloud project, hybrid Organization and Environment
      Google Cloud project ID: apigee-hybrid-123456
      Note: This is the project ID where the clusters are located.
      Apigee hybrid org: apigee-hybrid-123456
      Apigee hybrid env: dev
  • Apigee hybrid and other CLI versions:
      Apigee hybrid CLI (apigeectl) version:
        Version: 1.2.0
        Commit: ac09109
        Build ID: 214
        Build Time: 2020-03-30T20:23:36Z
        Go Version: go1.12

      Kubectl version:
        Client Version:
    version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}

        Server Version:
    version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.36", GitCommit:"34a615f32e9a0c9e97cdb9f749adb392758349a6", GitTreeState:"clean",
  • Kubernetes cluster name details:
      k8sCluster:
      name: user-cluster-1
      region: us-east1
  • Network topology
    Attached the file network-topology.png
  • Overrides YAML File
    Attached the file overrides.yaml file

Steps to reproduce:

Run the following management API to observe the error:

curl -X GET --header "Authorization: Bearer <TOKEN>" "https://apigee.googleapis.com/v1/organizations/apigee-hybrid-123456/environments/dev/keyvaluemaps"

Diagnostic Information:

Attached the following files:

  • network-topology.png
  • overrides.yaml file
  • MART logs
  • Synchronizer logs