Insufficient CPU

Symptom

When starting up, the telemetry pods go in and out of CrashLoopBackoff state. This can cause periodic gaps in your metrics or graphs as the pods restart. You could also see discrepancies with analytics data as some sections of data are missing.

Error messages

When you use kubectl to view the pod states, you will see one or more metric pods in the CrashLoopBackoff state. Refer to the following command:

kubectl get pods -n APIGEE_NAMESPACE

Where APIGEE_NAMESPACE is the Kubernetes namespace for your Apigee hybrid components. For more information, see Create the apigee namespace.

Sample Output

NAME                                                      READY   STATUS             RESTARTS   AGE
apigee-metrics-default-telemetry-proxy-1104-hvwoo-zlmlw   0/1     CrashLoopBackoff   10         10m
apigee-metrics-adapter-apigee-telemetry-1104-7fyff-tts65  0/1     CrashLoopBackoff   10         10m
apigee-metrics-default-telemetry-proxy-1104-hvwoo-zlmlw   0/1     FailedScheduling   0          12m

Common diagnosis steps

  1. Check the events for issues with telemetry pods with the following command:
    kubectl -n apigee get event 

    Sample Output

    LAST SEEN   TYPE      REASON           OBJECT                                                           MESSAGE
    53m         Normal    SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251940 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-292519fkt7j
    53m         Normal    Completed        job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251940 Job completed
    43m         Normal    SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251950 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-292519l87m8
    43m         Normal    Completed        job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251950 Job completed
    33m         Normal    SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251960 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-29251962ncc
        
  2. You can also check the events of telemetry pods with a CrashLoopBackOff state using the following command:
    kubectl -n apigee describe POD_NAME

    Where POD_NAME is the name of the pod that is in a CrashLoopBackOff state.

    Sample Output

     apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv    
  3. You can also check the cpu status of the pods with the following command:
    kubectl -n apigee get hpa | grep unknown

    Sample Output

    apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv   ReplicaSet/apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv   /80%                                2         10        2          8h
     

Possible causes

Cause Description Troubleshooting instructions applicable for
metrics.app.resources.requests.cpu and metrics.app.resources.limits.cpu are missing The cpu must be specified in the overrides.yaml file. Apigee hybrid

Cause

cpu is not mentioned in the overrides.yaml file, so cpu gets an undefined value.

Diagnosis

Check your overrides.yaml file to see if both cpu values are defined for metrics.app.resources.requests.cpu and metrics.app.resources.limits.cpu.

Resolution

If cpu settings are missing in your overrides.yaml file for metrics, provide both cpu values in the overrides.yaml file.

  1. Add the following configuration under the metrics section in your overrides.yaml file:

    metrics:
      app: # The apigee-prometheus-app container in the "app" pod
        resources:
          requests:
            memory: 512Mi # Default value: 512Mi
            cpu: 500m # Default value: 500m
          limits:
            memory: 2Gi # default: 1Gi
            cpu: 500m # Default value: 500m
      

  2. Apply changes using the following command:
    helm upgrade ENV_RELEASE_NAME apigee-env/ \
    --install \
    --namespace APIGEE_NAMESPACE \
    --set env=ENV_NAME \
    -f OVERRIDES_FILE
    • Where ENV_RELEASE_NAME is a unique name used to track installation and upgrade of the apigee-env chart. While it's typically the same as the ENV_NAME, it must be different if your environment has the same name as your environment group. For example, if both are named dev, you would use dev-env-release and dev-envgroup-release to distinguish them.

    • Where APIGEE_NAMESPACE is the Kubernetes namespace for your Apigee hybrid components. For more information, see Create the apigee namespace.

    • Where ENV_NAME is the name you used when you created the environment in the UI.

    • Where OVERRIDES_FILE is the overrides.yaml file that is used during upgrades or install.

For more information, see Configuration property reference.

Must gather diagnostic information

If the problem persists even after following the above instructions, gather the following diagnostic information and then contact Google Cloud Customer Care:

  1. The overrides.yaml file.
  2. The output from the Apigee hybrid must-gather script.