You're viewing Apigee and Apigee hybrid documentation.
There is no equivalent
Apigee Edge documentation for this topic.
This document describes how to reset Apigee hybrid components when they are stuck in a
creating
or releasing
state.
Run the following command to list the Apigee hybrid installation main components:
kubectl get crd | grep apigee
apigeeorganization (apigeeorganizations.apigee.cloud.google.com) apigeeenvironment (apigeeenvironments.apigee.cloud.google.com) apigeedatastore (apigeedatastores.apigee.cloud.google.com) apigeetelemetries (apigeetelemetries.apigee.cloud.google.com) apigeeredis (apigeeredis.apigee.cloud.google.com)
Run the following command to display the current state:
kubectl get apigeedatastore -n NAMESPACE
When fully functional, each of these components will be in a running
state.
For example:
NAME STATE AGE default running 5d6h
If the installation is not successful, components may be stuck in a creating
(or
releasing
) state. For example:
NAME STATE AGE default creating 5d6h
Identify the problem
To identify the cause for the issue, begin by describing each component. The components are structured as follows:
Each ApigeeOrganization
custom resource is represented by the following hierarchy:
ApigeeOrganization/HASHED_VALUE ├─ApigeeDeployment/apigee-connect-agent-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-connect-agent-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-connect-agent-HASHED_VALUE│ ├─ReplicaSet/apigee-connect-agent-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-connect-agent-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-mart-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-mart-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-mart-HASHED_VALUE│ ├─ReplicaSet/apigee-mart-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-mart-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-watcher-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-watcher-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-watcher-HASHED_VALUE│ ├─ReplicaSet/apigee-watcher-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-watcher-HASHED_VALUE-VER-xxxx
Each ApigeeEnvironment
custom resource is represented by the following hierarchy:
ApigeeEnvironment/HASHED_VALUE ├─ApigeeDeployment/apigee-runtime-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-runtime-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-runtime-HASHED_VALUE│ ├─ReplicaSet/apigee-runtime-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-runtime-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-synchronizer-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-synchronizer-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-synchronizer-HASHED_VALUE│ ├─ReplicaSet/apigee-synchronizer-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-synchronizer-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-udca-HASHED_VALUE
│ ├─HorizontalPodAutoscaler/apigee-udca-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-udca-HASHED_VALUE│ ├─ReplicaSet/apigee-udca-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-udca-HASHED_VALUE-VER-xxxx
Begin problem identification by describing the root component. For example:
kubectl describe apigeeorganization -n NAMESPACE COMPONENT_NAME
Check to see if the State
of the component is running
:
Replicas:
Available: 1
Ready: 1
Total: 1
Updated: 1
State: running
State: running
Events: <none>
If there are no events logged at this level, repeat the process with
apigeedeployments
followed by ReplicaSet
. For example:
kubectl get apigeedeployment -n NAMESPACE AD_NAME>
If apigeedeployments
and ReplicaSet
do not show any errors, focus
on the pods that are not ready:
kubectl get pods -n NAMESPACE
NAME READY STATUS apigee-cassandra-default-0 1/1 Running apigee-connect-agent-apigee-b56a362-150rc2-42gax-dbrrn 1/1 Running apigee-logger-apigee-telemetry-s48kb 1/1 Running apigee-mart-apigee-b56a362-150rc2-bcizm-7jv6w0
/2 Running apigee-runtime-apigee-test-0d59273-150rc2-a5mov-dfb290
/1 Running
In this example, mart
and runtime
are not ready. Inspect the pod logs
to determine errors:
kubectl logs -n NAMESPACE POD_NAME
Deleting components
If you've made a mistake with any of these components, simply delete the component and apply
apigeectl
for that component. For example, to delete an environment:
kubectl delete -n apigee apigeeenv HASHED_ENV_NAME
Follow this up with creating the environment (after making the necessary corrections):
apigeectl apply -f overrides.yaml –env=$ENV
Inspect the controller
If there are no obvious error messages in the pod, but the component has not transitioned to the
running
state, inspect the apigee-controller
for error messages. The
apigee-controller
runs in the apigee-system
namespace.
kubectl logs -n apigee-system $(k get pods -n apigee-system | sed -n '2p' | awk '{print $1}') | grep -i error
This allows the user to see why the controller was unable to process the request
(of create
/delete
/update
, etc.).
Apigee datastore
Apache Cassandra is implemented as a StatefulSet
. Each Cassandra instance contains:
ApigeeDatastore/default
├─Certificate/apigee-cassandra-default │ └─CertificateRequest/apigee-cassandra-default-wnd7s ├─Secret/config-cassandra-default ├─Service/apigee-cassandra-default │ ├─EndpointSlice/apigee-cassandra-default-7m9kx │ └─EndpointSlice/apigee-cassandra-default-gzqpr└─StatefulSet/apigee-cassandra-default
├─ControllerRevision/apigee-cassandra-default-6976b77bd ├─ControllerRevision/apigee-cassandra-default-7fc76588cb└─Pod/apigee-cassandra-default-0
This example shows one pod; however, typical production installs contain three or more pods.
If the state for Cassandra is creating
or releasing
, the state MUST be
reset. Certain problems (like Cassandra password changes) and problems not related to networking
may require that you delete components. It is quite possible that in such cases, you cannot delete
the instance (i.e., kubectl delete apigeedatastore -n NAMESPACE default
). Using
--force
or --grace-period=0
also does not help.
The objective of reset
is to change the state of the component
(apigeedatastore
) from creating
or releasing
back to
running
. Changing the state in this way typically will not solve the
underlying problem. In most cases, the component should be deleted after a reset.
Attempt a delete (this won't be successful):
kubectl delete -n NAMESPACE apigeedatastore default
It is common for this command to not complete. Use Ctrl+C and terminate the call.
Reset the state:
On Window 1:
kubectl proxy
On Window 2:
curl -X PATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" --data '[{"op": "replace", "path": "/status/nestedState", "value": ""},{"op": "replace", "path": "/status/state", "value": "running"}]' 'http://127.0.0.1:8001/apis/apigee.cloud.google.com/v1alpha1/namespaces/apigee/apigeedatastores/default/status'
Remove the finalizer (Window 2):
kubectl edit -n NAMESPACE apigeedatastore default
Look for the following two lines and delete them:
finalizers: - apigeedatastore.apigee.cloud.google.com
Common error scenarios
Proxy configuration not available with runtime
This error can manifest in one of two ways:
- The
runtime
is not in theready
state. - The
runtime
has not received the latest version of the API.
Start with the
synchronizer
pods.Inspect the logs for the
synchronizer
. Common errors are as follows:- Lack of network connectivity (to
*.googleapi.com
) - Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked
- Lack of network connectivity (to
Inspect the
runtime
pods.Inspecting the logs from the
runtime
pods will show why theruntime
did not load the configuration. The control plane attempts to prevent most configuration mistakes from even going to the data plane. In cases where a validation is either impossible or not correctly implemented, theruntime
will fail to load it.
"No runtime pods" in the control plane
Start with the
synchronizer
pods.Inspect the logs for the
synchronizer
. Common errors are as follows:- Lack of network connectivity (to
*.googleapi.com
) - Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked. Perhaps the configuration never made it to the data plane.
- Lack of network connectivity (to
Inspect the
runtime
pods.Inspecting the logs from the
runtime
pods will show why theruntime
did not load the configuration.Inspect the
watcher
pods.It is the
watcher
component that configures the ingress (routing) and reports proxy and ingress deployment status to the control plane. Inspect these logs to find out why thewatcher
is not reporting the status. Common reasons include a mismatch between the names in theoverrides.yaml
file and the control plane for environment name and/or environment group name.
Debug session is not appearing in the control plane
Start with the
synchronizer
pods.Inspect the logs for the
synchronizer
. Common errors are as follows:- Lack of network connectivity (to
*.googleapi.com
) - Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked.
- Lack of network connectivity (to
- Inspect the
runtime
pods.
Inspecting the logs from theruntime
pods will show why theruntime
is not sending debug logs to UDCA. - Inspect the UDCA pods.
Inspecting the logs from the UDCA will show why UDCA is not sending debug session information to control plane.
Cassandra returning large cache responses
The following warning message indicates that Cassandra is receiving read or write requests with a larger payload and can be safely ignored as this warning threshold is set to a lower value to indicate the response payload sizes.
Batch for [cache_ahg_gap_prod_hybrid.cache_map_keys_descriptor, cache_ahg_gap_prod_hybrid.cache_map_entry] is of size 79.465KiB, exceeding specified threshold of 50.000KiB by 29.465KiB