이 문서에서는 일반적인 Cloud Service Mesh 문제와 해결 방법을 설명합니다. 추가 지원이 필요하면 지원 받기를 참조하세요.
Cloud Service Mesh 엔드포인트에 연결할 때 연결이 거부됨
애플리케이션 워크로드가 연결해야 하는 Memorystore Redis, Cloud SQL 또는 외부 서비스와 같이 클러스터에서 엔드포인트로 통신할 때 간헐적으로 연결 거부(ECONNREFUSED) 오류가 발생할 수 있습니다.
이 문제는 애플리케이션 워크로드가 istio-proxy(Envoy) 컨테이너보다 빠르게 시작되고 외부 엔드포인트에 연결하려고 시도할 때 발생할 수 있습니다. 이 단계에서는 istio-init(initContainer)가 이미 실행되었기 때문에 iptables 규칙에 따라 모든 송신 트래픽이 Envoy로 리디렉션됩니다. istio-proxy가 아직 준비되지 않았기 때문에 iptables 규칙이 아직 시작되지 않은 사이드카 프록시로 트래픽을 리디렉션하고, 따라서 애플리케이션에 ECONNREFUSED 오류가 표시됩니다.
I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Container image "docker.io/istio/proxyv2:1.3.3" already present on machine spec.containers{istio-proxy}
I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Created container spec.containers{istio-proxy}
I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Started container spec.containers{istio-proxy}
I 2020-03-31T10:41:15Z spec.containers{APP-CONTAINER-NAME} Created container spec.containers{APP-CONTAINER-NAME}
W 2020-03-31T10:41:17Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}
W 2020-03-31T10:41:26Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}
W 2020-03-31T10:41:28Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}
W 2020-03-31T10:41:31Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}
W 2020-03-31T10:41:58Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}
오류 및 istio-proxy 시작 이벤트의 타임스탬프를 사용해서 Envoy가 준비되지 않았을 때 오류가 발생하는지 확인합니다.
istio-proxy 컨테이너가 아직 준비되지 않았을 때 오류가 발생한 경우 연결 거부 오류가 발생하는 것이 일반적입니다. 앞의 예시에서는 2020-03-31T10:41:15.552128897Z가 되는 즉시 포드가 Redis에 연결하려고 시도하지만 2020-03-31T10:41:58Z까지 istio-proxy에서 프로브 준비가 실패했습니다.
istio-proxy 컨테이너가 먼저 시작되었지만 앱이 외부 엔드포인트에 연결을 시도하기 전에 충분히 빠르게 준비되지 않았을 수 있습니다.
이 문제에 해당하는 경우에는 다음 문제 해결 단계를 수행합니다.
포드 레벨에서 구성을 주석 처리합니다. 이것은 포드 레벨에서만 사용할 수 있고 전역 레벨에서는 사용할 수 없습니다.
외부 서비스를 요청하기 전에 Envoy가 준비되었는지 확인하도록 애플리케이션 코드를 수정합니다. 예를 들어 애플리케이션 시작 시 istio-proxy 상태 엔드포인트에 요청을 수행하고 200이 회신된 다음에만 작업을 계속하는 루프를 시작합니다. istio-proxy 상태 엔드포인트는 다음과 같습니다.
http://localhost:15020/healthz/ready
Vault와 Cloud Service Mesh 사이의 사이드카 삽입 중 경합 조건
보안 비밀 관리를 위해 vault를 사용할 때 일부 경우에는 vault가 istio 앞에 사이드카를 삽입하여 포드가 Init 상태로 중단될 수 있습니다. 이 문제가 발생하면 배포를 다시 시작하거나 새 항목을 배포한 후 생성된 포드가 Init 상태로 잠깁니다. 예를 들면 다음과 같습니다.
E 2020-03-31T10:41:15.552128897Z
post-feature-service post-feature-service-v1-67d56cdd-g7fvb failed to create
connection to feature-store redis, err=dial tcp 192.168.9.16:19209: connect:
connection refused post-feature-service post-feature-service-v1-67d56cdd-g7fvb
이 문제는 경합 조건에서 발생합니다. Istio와 vault가 모두 사이드카를 삽입하지만 Istio가 마지막으로 삽입해야 합니다. istio 프록시는 init 컨테이너 중에 실행되지 않습니다. istio init 컨테이너는 모든 트래픽을 프록시로 리디렉션하는 iptables 규칙을 설정합니다. 아직 실행 중이 아니므로 이러한 규칙에 따라 리디렉션이 수행되지 않고 모든 트래픽이 차단됩니다. 따라서 init 컨테이너가 마지막 컨테이너여야 하며, 따라서 iptables 규칙이 설정된 직후 프록시가 작동되고 실행됩니다. 하지만 이러한 순서가 결정적이지 않기 때문에 Istio가 먼저 삽입되면 문제가 발생합니다.
이 문제를 해결하기 위해서는 Vault IP로 이동하는 트래픽이 아직 준비되지 않은 Envoy 프록시로 리디렉션되지 않도록 해서 통신을 방해하지 않도록 vault IP 주소를 허용해야 합니다. 이렇게 하려면 excludeOutboundIPRanges라는 새 주석을 추가해야 합니다.
관리형 Cloud Service Mesh의 경우에는 spec.template.metadata.annotations 아래의 배포 또는 포드 수준에서만 사용할 수 있으며 예를 들면 다음과 같습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Resolving workload startup issues in Cloud Service Mesh\n=======================================================\n\nThis document explains common Cloud Service Mesh problems and how to resolve\nthem. If you need additional assistance, see\n[Getting support](/service-mesh/v1.22/docs/getting-support).\n\nConnection Refused when reaching a Cloud Service Mesh endpoint\n--------------------------------------------------------------\n\nYou might intermittently experience connection refused (`ECONNREFUSED`) errors\nwith communication from your clusters to your endpoints, for example\nMemorystore Redis, Cloud SQL, or any external service your application\nworkload needs to reach.\n\nThis can occur when your application workload initiates faster than the\nistio-proxy (`Envoy`) container and tries to reach an external endpoint. Because\nat this stage istio-init (`initContainer`) has already executed, there are\niptables rules in place redirecting all outgoing traffic to `Envoy`. Since\nistio-proxy is not ready yet, the iptables rules will redirect traffic to a\nsidecar proxy that is not yet started and therefore, the application gets the\n`ECONNREFUSED` error.\n\nThe following steps detail how to check if this is the error you are\nexperiencing:\n\n1. Check the stackdriver logs with the following Filter to identify which pods\n had the problem.\n\n The following example shows a typical error message: \n\n Error: failed to create connection to feature-store redis, err=dial tcp 192.168.9.16:19209: connect: connection refused\n [ioredis] Unhandled error event: Error: connect ECONNREFUSED\n\n2. Search for an occurrence of the problem. If you are using legacy Stackdriver,\n then use `resource.type=\"container\"`.\n\n resource.type=\"k8s_container\"\n textPayload:\"$ERROR_MESSAGE$\"\n\n3. Expand the latest occurrence to obtain the name of the pod and then make note\n of the `pod_name` under `resource.labels`.\n\n4. Obtain the first occurrence of the issue for that pod:\n\n resource.type=\"k8s_container\"\n resource.labels.pod_name=\"$POD_NAME$\"\n\n Example output: \n\n E 2020-03-31T10:41:15.552128897Z\n post-feature-service post-feature-service-v1-67d56cdd-g7fvb failed to create\n connection to feature-store redis, err=dial tcp 192.168.9.16:19209: connect:\n connection refused post-feature-service post-feature-service-v1-67d56cdd-g7fvb\n\n5. Make note of the timestamp of the first error for this pod.\n\n6. Use the following filter to see the pod startup events.\n\n resource.type=\"k8s_container\"\n resource.labels.pod_name=\"$POD_NAME$\"\n\n Example output: \n\n I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Container image \"docker.io/istio/proxyv2:1.3.3\" already present on machine spec.containers{istio-proxy}\n I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Created container spec.containers{istio-proxy}\n I 2020-03-31T10:41:15Z spec.containers{istio-proxy} Started container spec.containers{istio-proxy}\n I 2020-03-31T10:41:15Z spec.containers{APP-CONTAINER-NAME} Created container spec.containers{APP-CONTAINER-NAME}\n W 2020-03-31T10:41:17Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}\n W 2020-03-31T10:41:26Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}\n W 2020-03-31T10:41:28Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}\n W 2020-03-31T10:41:31Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}\n W 2020-03-31T10:41:58Z spec.containers{istio-proxy} Readiness probe failed: HTTP probe failed with statuscode: 503 spec.containers{istio-proxy}\n\n7. Use the timestamps of errors and istio-proxy startup events to confirm the\n errors are happening when `Envoy` is not ready.\n\n If the errors occur while the istio-proxy container is not ready yet, it is\n normal to obtain connection refused errors. In the preceding example, the pod\n was trying to connect to Redis as soon as `2020-03-31T10:41:15.552128897Z`\n but by `2020-03-31T10:41:58Z` istio-proxy was still failing readiness probes.\n\n Even though the istio-proxy container started first, it is possible that it\n did not become ready fast enough before the app was already trying to connect\n to the external endpoint.\n\n If this is the problem you are experiencing, then continue through the\n following troubleshooting steps.\n8. Annotate the config at the pod level. This is *only* available at the pod\n level and not at a global level.\n\n annotations:\n proxy.istio.io/config: '{ \"holdApplicationUntilProxyStarts\": true }'\n\n9. Modify the application code so that it checks if `Envoy` is ready before it\n tries to make any other requests to external services. For example, on\n application start, initiate a loop that makes requests to the istio-proxy\n health endpoint and only continues once a 200 is obtained. The istio-proxy\n health endpoint is as follows:\n\n http://localhost:15020/healthz/ready\n\nRace condition during sidecar injection between Vault and Cloud Service Mesh\n----------------------------------------------------------------------------\n\nWhen using `vault` for secrets management, sometimes `vault` injects sidecar\nbefore `istio`, causing that Pods get stuck in `Init` status. When this happens,\nthe Pods created get stuck in Init status after restarting any deployment or\ndeploying a new one. For example: \n\n E 2020-03-31T10:41:15.552128897Z\n post-feature-service post-feature-service-v1-67d56cdd-g7fvb failed to create\n connection to feature-store redis, err=dial tcp 192.168.9.16:19209: connect:\n connection refused post-feature-service post-feature-service-v1-67d56cdd-g7fvb\n\nThis issue is caused by a race condition, both Istio and `vault` inject the\nsidecar and Istio must be the last doing this, the `istio` proxy is not running\nduring init containers. The `istio` init container sets up iptables rules to\nredirect all traffic to the proxy. Since it is not running yet, those rules\nredirect to nothing, blocking all traffic. This is why the init container must\nbe last, so the proxy is up and running immediately after the iptables rules are\nset up. Unfortunately, the order is not deterministic, so if Istio is injected\nfirst it breaks.\n\nTo troubleshoot this condition, allow the IP address of `vault` so the traffic\ngoing to the Vault IP is not redirected to the Envoy Proxy which is not ready\nyet and therefore blocking the communication. To achieve this, a new annotation\nnamed `excludeOutboundIPRanges` should be added.\n\nFor managed Cloud Service Mesh, this is only possible at Deployment or Pod\nlevel under `spec.template.metadata.annotations`, for example: \n\n apiVersion: apps/v1\n kind: Deployment\n ...\n ...\n ...\n spec:\n template:\n metadata:\n annotations:\n traffic.sidecar.istio.io/excludeOutboundIPRanges:\n\nFor in-cluster Cloud Service Mesh, there is an option to set it as a global\none with an IstioOperator under `spec.values.global.proxy.excludeIPRanges`, for\nexample: \n\n apiVersion: install.istio.io/v1alpha1\n kind: IstioOperator\n spec:\n values:\n global:\n proxy:\n excludeIPRanges: \"\"\n\nAfter adding the annotation, restart your workloads."]]