您正在查看 Apigee 和 Apigee Hybrid 文档。
查看 Apigee Edge 文档。
表现
Apigee Hybrid 界面中显示 API 代理部署失败和没有活动的运行时 pod (No active runtime pods) 警告。
错误消息
在 API 代理 页面上,没有活动的运行时 pod (No active runtime pods) 警告显示在错误消息 Deployment issues on ENVIRONMENT: REVISION_NUMBER 旁边的 详细信息对话框中:
此问题可能会在界面的其他资源页面中显示为不同的错误。以下是一些错误消息示例:
Hybrid 界面错误消息 1:数据存储区错误
您可能会在 Hybrid 界面的 API 产品和应用页面上看到数据存储区错误,如下所示:
Hybrid 界面错误消息 2:内部服务器错误
您可能会在界面的开发者页面上看到内部服务器错误,如下所示:
Kubectl 命令输出
您可能会在 kubectl get pods
命令输出中看到 apiege-mart
、apigee-runtime
和 apigee-
synchronizer
pod 状态变为 CrashLoopBackOff
:
组件日志错误消息
您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-runtime
pod 日志中看到以下活跃性探测失败错误:
{"timestamp":"1621575431454","level":"ERROR","thread":"qtp365724939-205","mdc":{"targetpath":"/v1/pr obes/live"},"logger":"REST","message":"Error occurred : probe failed Probe cps-datastore- connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code = cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated contexts = []}\n\n\tcom.apigee.probe.ProbeAPI.getResponse(ProbeAPI.java:66)\n\tcom.apigee.probe.ProbeAPI.getLiv eStatus(ProbeAPI.java:55)\n\tsun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)\n\tsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\t ","context":"apigee-service- logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR esponse"} {"timestamp":"1621575431454","level":"ERROR","thread":"qtp365724939-205","mdc":{"targetpath":"/v1/pr obes/live"},"logger":"REST","message":"Returning error response : ErrorResponse{errorCode = probe.ProbeRunError, errorMessage = probe failed Probe cps-datastore-connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code = cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated contexts = []}}","context":"apigee-service- logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR esponse"}
您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-synchronizer
pod 日志中看到以下 Cannot build a cluster without contact points
错误:
{"timestamp":"1621575636434","level":"ERROR","thread":"main","logger":"KERNEL.DEPLOYMENT","message": "ServiceDeployer.deploy() : Got a life cycle exception while starting service [SyncService, Cannot build a cluster without contact points] : {}","context":"apigee-service- logs","exception":"java.lang.IllegalArgumentException: Cannot build a cluster without contact points\n\tat com.datastax.driver.core.Cluster.checkNotEmpty(Cluster.java:134)\n\tat com.datastax.driver.core.Cluster.<init>(Cluster.java:127)\n\tat com.datastax.driver.core.Cluster.buildFrom(Cluster.java:193)\n\tat com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1350)\n\tat io.apigee.persistence.PersistenceContext.newCluster(PersistenceContext.java:214)\n\tat io.apigee.persistence.PersistenceContext.<init>(PersistenceContext.java:48)\n\tat io.apigee.persistence.ApplicationContext.<init>(ApplicationContext.java:19)\n\tat io.apigee.runtimeconfig.service.RuntimeConfigServiceImpl.<init>(RuntimeConfigServiceImpl.java:75) \n\tat io.apigee.runtimeconfig.service.RuntimeConfigServiceFactory.newInstance(RuntimeConfigServiceFactory. java:99)\n\tat io.apigee.common.service.AbstractServiceFactory.initializeService(AbstractServiceFactory.java:301)\n \tat ...","severity":"ERROR","class":"com.apigee.kernel.service.deployment.ServiceDeployer","method":"sta rtService"}
您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-mart
pod 日志中看到以下活跃性探测失败错误:
{"timestamp":"1621576757592","level":"ERROR","thread":"qtp991916558-144","mdc":{"targetpath":"/v1/pr obes/live"},"logger":"REST","message":"Error occurred : probe failed Probe cps-datastore- connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code = cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated contexts = []}\n\n\tcom.apigee.probe.ProbeAPI.getResponse(ProbeAPI.java:66)\n\tcom.apigee.probe.ProbeAPI.getLiv eStatus(ProbeAPI.java:55)\n\tsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\t","conte xt":"apigee-service- logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR esponse"} {"timestamp":"1621576757593","level":"ERROR","thread":"qtp991916558-144","mdc":{"targetpath":"/v1/pr obes/live"},"logger":"REST","message":"Returning error response : ErrorResponse{errorCode = probe.ProbeRunError, errorMessage = probe failed Probe cps-datastore-connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code = cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated contexts = []}}","context":"apigee-service- logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR esponse"}
关于“没有活动的运行时 pod”错误的信息
在 Apigee Hybrid 1.4.0 版本中,apigee-runtime
和 apigee-mart
pod 中添加了活跃性探测功能,用于检查 Cassandra pod 的状态。如果所有 Cassandra pod 都不可用,则 apigee-runtime
和 apigee-mart
pod 的活跃性探测将失败。apigee-runtime
和 apigee-mart
pod 将变为 CrashLoopBackOff
状态,导致 API 代理的部署失败并显示警告 No active runtime pods
。由于 Cassandra pod 不可用,apigee-synchronizer
pod 也将变为 CrashLoopBackOff
状态。
可能的原因
以下是此错误的一些可能原因:
原因 | 说明 |
---|---|
Cassandra pod 故障 | Cassandra pod 出现故障;因此,apigee-runtime pod 将无法与 Cassandra 数据库通信。 |
Cassandra 副本仅配置了一个 pod | 只有一个 Cassandra pod 可能会成为单点故障。 |
原因:Cassandra pod 故障
在 API 代理部署过程中,apigee-runtime
pod 连接到 Cassandra 数据库,以提取 API 代理中定义的资源(例如键值对映射 (KVM) 和缓存)。如果没有 Cassandra pod 在运行,则 apigee-runtime
pod 将无法连接到 Cassandra 数据库。这会导致 API 代理部署失败。
诊断
- 列出 Cassandra pod:
kubectl -n apigee get pods -l app=apigee-cassandra
示例输出 1:
NAME READY STATUS RESTARTS AGE apigee-cassandra-default-0 0/1 Pending 0 9m23s
示例输出 2:
NAME READY STATUS RESTARTS AGE apigee-cassandra-0 0/1 CrashLoopBackoff 0 10m
- 验证每个 Cassandra pod 的状态。所有 Cassandra pod 的状态都应为
Running
状态。如果有任何 Cassandra pod 处于不同状态,则可能是此问题的原因。请执行以下步骤来解决此问题:
解决方法
- 如果任何 Cassandra pod 处于
Pending
状态,请参阅 Cassandra pod 卡在 Pending 状态,以排查并解决问题。 - 如果任何 Cassandra pod 处于
CrashLoopBackoff
状态,请参阅 Cassandra pod 卡在 CrashLoopBackoff 状态,以排查并解决问题。示例输出:
kubectl -n apigee get pods -l app=apigee-runtime NAME READY STATUS RESTARTS AGE apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-2gnch 1/1 Running 13 43m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-42jdv 1/1 Running 13 45m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-l7wq7 1/1 Running 13 43m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-q2thb 1/1 Running 8 38m
kubectl -n apigee get pods -l app=apigee-mart NAME READY STATUS RESTARTS AGE apigee-mart-apigee-hybrid-s-2664b3e-143-u0a5c-rtg69 2/2 Running 8 28m
kubectl -n apigee get pods -l app=apigee-synchronizer NAME READY STATUS RESTARTS AGE apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp269nb 2/2 Running 10 29m apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp2w2jp 2/2 Running 0 4m40s apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpkfkvq 2/2 Running 0 4m40s apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpxmzhn 2/2 Running 0 4m40s
原因:Cassandra 副本仅配置了一个 pod
如果 Cassandra 副本数量配置为 1,则运行时中只有一个 Cassandra pod 可用。因此,如果该 Cassandra pod 在某段时间内不可用,apigee-runtime
pod 可能会遇到连接问题。
诊断
- 获取 Cassandra 有状态集并检查当前副本数量:
kubectl -n apigee get statefulsets -l app=apigee-cassandra
示例输出:
NAME READY AGE apigee-cassandra-default 1/1 21m
- 如果副本数量配置为 1,请执行以下步骤,将其更改为更大的数字。
解决方法
Apigee Hybrid 非生产部署可能将 Cassandra 副本数量设置为 1。如果在非生产部署中 Cassandra 的高可用性非常重要,请将副本数量增加到 3,以解决此问题。
请执行以下步骤来解决此问题:
- 更新
overrides.yaml
文件并将 Cassandra 副本数量设置为 3:cassandra: replicaCount: 3
如需了解 Cassandra 配置信息,请参阅配置属性参考文档。
- 使用
apigeectl
CLI 应用上述配置:cd path/to/hybrid-files apigeectl apply -f overrides/overrides.yaml
- 获取 Cassandra 有状态集并检查当前副本数量:
kubectl -n get statefulsets -l app=apigee-cassandra
示例输出:
NAME READY AGE apigee-cassandra-default 3/3 27m
- 获取 Cassandra pod 并查看当前实例数量。如果所有 pod 尚未准备就绪并处于
Running
状态,请等待系统创建并激活新的 Cassandra pod:kubectl -n get pods -l app=apigee-cassandra
示例输出:
NAME READY STATUS RESTARTS AGE apigee-cassandra-default-0 1/1 Running 0 29m apigee-cassandra-default-1 1/1 Running 0 21m apigee-cassandra-default-2 1/1 Running 0 19m
示例输出:
kubectl -n apigee get pods -l app=apigee-runtime NAME READY STATUS RESTARTS AGE apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-2gnch 1/1 Running 13 43m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-42jdv 1/1 Running 13 45m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-l7wq7 1/1 Running 13 43m apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-q2thb 1/1 Running 8 38m
kubectl -n apigee get pods -l app=apigee-mart NAME READY STATUS RESTARTS AGE apigee-mart-apigee-hybrid-s-2664b3e-143-u0a5c-rtg69 2/2 Running 8 28m
kubectl -n apigee get pods -l app=apigee-synchronizer NAME READY STATUS RESTARTS AGE apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp269nb 2/2 Running 10 29m apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp2w2jp 2/2 Running 0 4m40s apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpkfkvq 2/2 Running 0 4m40s apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpxmzhn 2/2 Running 0 4m40s
必须收集的诊断信息
如果按照上述说明操作后问题仍然存在,请收集以下诊断信息,然后与 Google Cloud Customer Care 联系。
- Google Cloud 项目 ID
- Apigee Hybrid/Apigee 组织
- 对于 Apigee Hybrid:
overrides.yaml
,遮盖所有敏感信息 - 所有命名空间中的 Kubernetes pod 状态:
kubectl get pods -A > kubectl-pod-status`date +%Y.%m.%d_%H.%M.%S`.txt
- Kubernetes 集群信息转储:
# generate kubernetes cluster-info dump kubectl cluster-info dump -A --output-directory=/tmp/kubectl-cluster-info-dump # zip kubernetes cluster-info dump zip -r kubectl-cluster-info-dump`date +%Y.%m.%d_%H.%M.%S`.zip /tmp/kubectl-cluster-info-dump/*