API 代理部署失败,没有活动运行时 pod 警告

您正在查看 ApigeeApigee Hybrid 文档。
查看 Apigee Edge 文档。

表现

Apigee Hybrid 界面中显示 API 代理部署失败和没有活动的运行时 pod (No active runtime pods) 警告。

错误消息

在 API 代理 页面上,没有活动的运行时 pod (No active runtime pods) 警告显示在错误消息 Deployment issues on ENVIRONMENT: REVISION_NUMBER 旁边的 详细信息对话框中:

此问题可能会在界面的其他资源页面中显示为不同的错误。以下是一些错误消息示例:

Hybrid 界面错误消息 1:数据存储区错误

您可能会在 Hybrid 界面的 API 产品应用页面上看到数据存储区错误,如下所示:

Hybrid 界面错误消息 2:内部服务器错误

您可能会在界面的开发者页面上看到内部服务器错误,如下所示:

Kubectl 命令输出

您可能会在 kubectl get pods 命令输出中看到 apiege-martapigee-runtimeapigee- synchronizer pod 状态变为 CrashLoopBackOff

组件日志错误消息

您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-runtime pod 日志中看到以下活跃性探测失败错误:

{"timestamp":"1621575431454","level":"ERROR","thread":"qtp365724939-205","mdc":{"targetpath":"/v1/pr
obes/live"},"logger":"REST","message":"Error occurred : probe failed Probe cps-datastore-
connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code =
cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated
contexts =
[]}\n\n\tcom.apigee.probe.ProbeAPI.getResponse(ProbeAPI.java:66)\n\tcom.apigee.probe.ProbeAPI.getLiv
eStatus(ProbeAPI.java:55)\n\tsun.reflect.GeneratedMethodAccessor52.invoke(Unknown
Source)\n\tsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\t
","context":"apigee-service-
logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR
esponse"}

{"timestamp":"1621575431454","level":"ERROR","thread":"qtp365724939-205","mdc":{"targetpath":"/v1/pr
obes/live"},"logger":"REST","message":"Returning error response : ErrorResponse{errorCode =
probe.ProbeRunError, errorMessage = probe failed Probe cps-datastore-connectivity-liveliness-probe
failed due to com.apigee.probe.model.ProbeFailedException{ code =
cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated
contexts = []}}","context":"apigee-service-
logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR
esponse"}

您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-synchronizer pod 日志中看到以下 Cannot build a cluster without contact points 错误:

{"timestamp":"1621575636434","level":"ERROR","thread":"main","logger":"KERNEL.DEPLOYMENT","message":
"ServiceDeployer.deploy() : Got a life cycle exception while starting service [SyncService, Cannot
build a cluster without contact points] : {}","context":"apigee-service-
logs","exception":"java.lang.IllegalArgumentException: Cannot build a cluster without contact
points\n\tat com.datastax.driver.core.Cluster.checkNotEmpty(Cluster.java:134)\n\tat
com.datastax.driver.core.Cluster.<init>(Cluster.java:127)\n\tat
com.datastax.driver.core.Cluster.buildFrom(Cluster.java:193)\n\tat
com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1350)\n\tat
io.apigee.persistence.PersistenceContext.newCluster(PersistenceContext.java:214)\n\tat
io.apigee.persistence.PersistenceContext.<init>(PersistenceContext.java:48)\n\tat
io.apigee.persistence.ApplicationContext.<init>(ApplicationContext.java:19)\n\tat
io.apigee.runtimeconfig.service.RuntimeConfigServiceImpl.<init>(RuntimeConfigServiceImpl.java:75)
\n\tat
io.apigee.runtimeconfig.service.RuntimeConfigServiceFactory.newInstance(RuntimeConfigServiceFactory.
java:99)\n\tat
io.apigee.common.service.AbstractServiceFactory.initializeService(AbstractServiceFactory.java:301)\n
\tat
...","severity":"ERROR","class":"com.apigee.kernel.service.deployment.ServiceDeployer","method":"sta
rtService"}

您会在 Apigee Hybrid 1.4.0 或更高版本的 apigee-mart pod 日志中看到以下活跃性探测失败错误:

{"timestamp":"1621576757592","level":"ERROR","thread":"qtp991916558-144","mdc":{"targetpath":"/v1/pr
obes/live"},"logger":"REST","message":"Error occurred : probe failed Probe cps-datastore-
connectivity-liveliness-probe failed due to com.apigee.probe.model.ProbeFailedException{ code =
cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated
contexts =
[]}\n\n\tcom.apigee.probe.ProbeAPI.getResponse(ProbeAPI.java:66)\n\tcom.apigee.probe.ProbeAPI.getLiv
eStatus(ProbeAPI.java:55)\n\tsun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)\n\tsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\t","conte
xt":"apigee-service-
logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR
esponse"}

{"timestamp":"1621576757593","level":"ERROR","thread":"qtp991916558-144","mdc":{"targetpath":"/v1/pr
obes/live"},"logger":"REST","message":"Returning error response : ErrorResponse{errorCode =
probe.ProbeRunError, errorMessage = probe failed Probe cps-datastore-connectivity-liveliness-probe
failed due to com.apigee.probe.model.ProbeFailedException{ code =
cps.common.datastoreConnectionNotHealthy, message = Datastore connection not healthy, associated
contexts = []}}","context":"apigee-service-
logs","severity":"ERROR","class":"com.apigee.rest.framework.container.ExceptionMapper","method":"toR
esponse"}

关于“没有活动的运行时 pod”错误的信息

在 Apigee Hybrid 1.4.0 版本中,apigee-runtimeapigee-mart pod 中添加了活跃性探测功能,用于检查 Cassandra pod 的状态。如果所有 Cassandra pod 都不可用,则 apigee-runtimeapigee-mart pod 的活跃性探测将失败。apigee-runtime apigee-mart pod 将变为 CrashLoopBackOff 状态,导致 API 代理的部署失败并显示警告 No active runtime pods。由于 Cassandra pod 不可用,apigee-synchronizer pod 也将变为 CrashLoopBackOff 状态。

可能的原因

以下是此错误的一些可能原因:

原因 说明
Cassandra pod 故障 Cassandra pod 出现故障;因此,apigee-runtime pod 将无法与 Cassandra 数据库通信。
Cassandra 副本仅配置了一个 pod 只有一个 Cassandra pod 可能会成为单点故障。

原因:Cassandra pod 故障

在 API 代理部署过程中,apigee-runtime pod 连接到 Cassandra 数据库,以提取 API 代理中定义的资源(例如键值对映射 (KVM) 和缓存)。如果没有 Cassandra pod 在运行,则 apigee-runtime pod 将无法连接到 Cassandra 数据库。这会导致 API 代理部署失败。

诊断

  1. 列出 Cassandra pod:
    kubectl -n apigee get pods -l app=apigee-cassandra
    

    示例输出 1

    NAME                         READY   STATUS    RESTARTS   AGE
    apigee-cassandra-default-0   0/1     Pending   0          9m23s
    

    示例输出 2

    NAME                 READY   STATUS            RESTARTS   AGE
    apigee-cassandra-0   0/1     CrashLoopBackoff  0          10m
  2. 验证每个 Cassandra pod 的状态。所有 Cassandra pod 的状态都应为 Running 状态。如果有任何 Cassandra pod 处于不同状态,则可能是此问题的原因。请执行以下步骤来解决此问题:

解决方法

  1. 如果任何 Cassandra pod 处于 Pending 状态,请参阅 Cassandra pod 卡在 Pending 状态,以排查并解决问题。
  2. 如果任何 Cassandra pod 处于 CrashLoopBackoff 状态,请参阅 Cassandra pod 卡在 CrashLoopBackoff 状态,以排查并解决问题。

    示例输出:

    kubectl -n apigee get pods -l app=apigee-runtime
    NAME                                                           READY   STATUS    RESTARTS   AGE
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-2gnch   1/1     Running   13         43m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-42jdv   1/1     Running   13         45m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-l7wq7   1/1     Running   13         43m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-q2thb   1/1     Running   8          38m
    
    kubectl -n apigee get pods -l app=apigee-mart
    NAME                                                  READY   STATUS    RESTARTS   AGE
    apigee-mart-apigee-hybrid-s-2664b3e-143-u0a5c-rtg69   2/2     Running   8          28m
    
    kubectl -n apigee get pods -l app=apigee-synchronizer
    NAME                                                              READY   STATUS    RESTARTS   AGE
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp269nb   2/2     Running   10         29m
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp2w2jp   2/2     Running   0          4m40s
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpkfkvq   2/2     Running   0          4m40s
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpxmzhn   2/2     Running   0          4m40s
    

原因:Cassandra 副本仅配置了一个 pod

如果 Cassandra 副本数量配置为 1,则运行时中只有一个 Cassandra pod 可用。因此,如果该 Cassandra pod 在某段时间内不可用,apigee-runtime pod 可能会遇到连接问题。

诊断

  1. 获取 Cassandra 有状态集并检查当前副本数量:
    kubectl -n apigee get statefulsets -l app=apigee-cassandra
    

    示例输出:

    NAME                               READY           AGE
    apigee-cassandra-default           1/1             21m
  2. 如果副本数量配置为 1,请执行以下步骤,将其更改为更大的数字。

解决方法

Apigee Hybrid 非生产部署可能将 Cassandra 副本数量设置为 1。如果在非生产部署中 Cassandra 的高可用性非常重要,请将副本数量增加到 3,以解决此问题。

请执行以下步骤来解决此问题:

  1. 更新 overrides.yaml 文件并将 Cassandra 副本数量设置为 3:
    cassandra:
      replicaCount: 3

    如需了解 Cassandra 配置信息,请参阅配置属性参考文档

  2. 使用 apigeectl CLI 应用上述配置:
    cd path/to/hybrid-files
    apigeectl apply -f overrides/overrides.yaml
    
  3. 获取 Cassandra 有状态集并检查当前副本数量:
    kubectl -n get statefulsets -l app=apigee-cassandra
    

    示例输出:

    NAME                              READY         AGE
    apigee-cassandra-default          3/3           27m
    
  4. 获取 Cassandra pod 并查看当前实例数量。如果所有 pod 尚未准备就绪并处于 Running 状态,请等待系统创建并激活新的 Cassandra pod:
    kubectl -n get pods -l app=apigee-cassandra

    示例输出:

    NAME                         READY   STATUS    RESTARTS   AGE
    apigee-cassandra-default-0   1/1     Running   0          29m
    apigee-cassandra-default-1   1/1     Running   0          21m
    apigee-cassandra-default-2   1/1     Running   0          19m
    

    示例输出:

    kubectl -n apigee get pods -l app=apigee-runtime
    NAME                                                           READY   STATUS    RESTARTS   AGE
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-2gnch   1/1     Running   13         43m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-42jdv   1/1     Running   13         45m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-l7wq7   1/1     Running   13         43m
    apigee-runtime-apigee-hybrid-s-test1-8b64f12-143-501i7-q2thb   1/1     Running   8          38m
    
    kubectl -n apigee get pods -l app=apigee-mart
    NAME                                                  READY   STATUS    RESTARTS   AGE
    apigee-mart-apigee-hybrid-s-2664b3e-143-u0a5c-rtg69   2/2     Running   8          28m
    
    kubectl -n apigee get pods -l app=apigee-synchronizer
    NAME                                                              READY   STATUS    RESTARTS   AGE
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp269nb   2/2     Running   10         29m
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zp2w2jp   2/2     Running   0          4m40s
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpkfkvq   2/2     Running   0          4m40s
    apigee-synchronizer-apigee-hybrid-s-test1-8b64f12-143-96zpxmzhn   2/2     Running   0          4m40s
    

必须收集的诊断信息

如果按照上述说明操作后问题仍然存在,请收集以下诊断信息,然后与 Google Cloud Customer Care 联系。

  1. Google Cloud 项目 ID
  2. Apigee Hybrid/Apigee 组织
  3. 对于 Apigee Hybrid:overrides.yaml,遮盖所有敏感信息
  4. 所有命名空间中的 Kubernetes pod 状态:
    kubectl get pods -A > kubectl-pod-status`date +%Y.%m.%d_%H.%M.%S`.txt
    
  5. Kubernetes 集群信息转储:
    # generate kubernetes cluster-info dump
    kubectl cluster-info dump -A --output-directory=/tmp/kubectl-cluster-info-dump
    # zip kubernetes cluster-info dump
    zip -r kubectl-cluster-info-dump`date +%Y.%m.%d_%H.%M.%S`.zip /tmp/kubectl-cluster-info-dump/*
    

参考