如果上述查询返回空结果,并且 GKE Pod 无法与外部 IP 地址通信,请使用下表来帮助您排查配置问题:
配置
问题排查
Cloud NAT 配置为仅应用于子网的主要 IP 地址范围。
如果仅针对子网的主 IP 地址范围配置 Cloud NAT,则从集群发送到外部 IP 地址的数据包必须具有来源节点 IP 地址。在此 Cloud NAT 配置中:
如果外部 IP 地址目标受 IP 地址伪装影响,Pod 可以将数据包发送到这些外部 IP 地址。部署 ip-masq-agent 时,请验证 nonMasqueradeCIDRs 列表不包含目标 IP 地址和端口。发送到这些目标位置的数据包首先会转换为来源节点 IP 地址,然后由 Cloud NAT 进行处理。
要允许 Pod 连接到此 Cloud NAT 配置的所有外部 IP 地址,请确保 ip-masq-agent 已部署并且 nonMasqueradeCIDRs 列表仅包含集群的节点和 Pod IP 地址范围。发送到集群外部目标位置的数据包会先转换为来源节点 IP 地址,然后由 Cloud NAT 进行处理。
如要阻止 Pod 向某些外部 IP 地址发送数据包,您需要明确屏蔽这些地址,以免它们被伪装。部署 ip-masq-agent 后,将要屏蔽的外部 IP 地址添加到 nonMasqueradeCIDRs 列表中。发送到这些目标的数据包会带着其原始 Pod IP 地址来源离开节点。Pod IP 地址来自集群子网的次要 IP 地址范围。在此配置中,Cloud NAT 不会对该次要范围进行操作。
Cloud NAT 已配置为仅应用于用于 Pod IP 的子网的次要 IP 地址范围。
如果 Cloud NAT 仅针对集群 Pod IP 使用的子网次要 IP 地址范围进行配置,则从集群发送到外部 IP 地址的数据包必须具有来源 Pod IP 地址。在此 Cloud NAT 配置中:
使用 IP 伪装代理会导致数据包在 Cloud NAT 处理时丢失其来源 Pod IP 地址。如需保留来源 Pod IP 地址,请在 nonMasqueradeCIDRs 列表中指定目标 IP 地址范围。部署 ip-masq-agent 后,发送到 nonMasqueradeCIDRs 列表中目标位置的所有数据包都会保留其来源 Pod IP 地址,然后由 Cloud NAT 进行处理。
如需允许 Pod 连接到此 Cloud NAT 配置的所有外部 IP 地址,请确保 ip-masq-agent 已部署,并且 nonMasqueradeCIDRs 列表尽可能大(0.0.0.0/0 指定所有 IP 地址目的地)。发送到所有目标的数据包在由 Cloud NAT 处理之前会保留来源 Pod IP 地址。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-03。"],[],[],null,["# Troubleshoot Cloud NAT packet loss from a cluster\n\n[Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows you how to resolve issues with [Cloud NAT](/nat/docs/overview)\npacket loss from a [VPC-native](/kubernetes-engine/docs/concepts/alias-ips)\nGoogle Kubernetes Engine (GKE) cluster with private nodes enabled.\n\nNode VMs in VPC-native GKE clusters with private nodes\ndon't have external IP addresses. This means that clients on the internet cannot\nconnect to the IP addresses of the nodes. You can use Cloud NAT to allocate\nthe external IP addresses and ports that allow clusters with private nodes to make public\nconnections.\n\nIf a node VM runs out of its allocation of external ports and IP addresses from\nCloud NAT, packets will drop. To avoid this, you can reduce the\noutbound packet rate or increase the allocation of available\nCloud NAT source IP addresses and ports. The following\nsections describe how to diagnose and troubleshoot packet loss from\nCloud NAT in the context of GKE clusters with private nodes.\n\nDiagnose packet loss\n--------------------\n\nThe following sections explains how to log dropped packets using\nCloud Logging, and diagnose the cause of dropped packets using\nCloud Monitoring.\n\n### Log dropped packets\n\nYou can log dropped packets with the following query in Cloud Logging: \n\n```\nresource.type=\"nat_gateway\"\nresource.labels.region=REGION\nresource.labels.gateway_name=GATEWAY_NAME\njsonPayload.allocation_status=\"DROPPED\"\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: the name of the region that the cluster is in.\n- \u003cvar translate=\"no\"\u003eGATEWAY_NAME\u003c/var\u003e: the name of the Cloud NAT gateway.\n\nThis command returns a list of all packets dropped by a Cloud NAT gateway,\nbut does not identify the cause.\n\n### Monitor causes for packet loss\n\nTo identify causes for dropped packets, query the\n[Metrics observer](https://console.cloud.google.com/monitoring/metrics-explorer) in\nCloud Monitoring. Packets drop for one of three reasons:\n\n- `OUT_OF_RESOURCES`\n- `ENDPOINT_INDEPENDENT_CONFLICT`\n- `NAT_ALLOCATION_FAILED`\n\nTo identify packets dropped due to `OUT_OF_RESOURCES` or\n`ENDPOINT_ALLOCATION_FAILED` error codes, use the following query: \n\n```\nfetch nat_gateway\n metric 'router.googleapis.com/nat/dropped_sent_packets_count'\n filter (resource.gateway_name == GATEWAY_NAME)\n align rate(1m)\n every 1m\n group_by [metric.reason],\n [value_dropped_sent_packets_count_aggregate:\n aggregate(value.dropped_sent_packets_count)]\n```\n\nIf you identify packets that drop because of these reasons, see\n[Packets dropped with reason: out of resources](/nat/docs/troubleshooting#insufficient-ports) and\n[Packets dropped with reason: endpoint independent conflict](/nat/docs/troubleshooting#endpoint-independent-conflict)\nfor troubleshooting advice.\n\nTo identify packets dropped due to the `NAT_ALLOCATION_FAILED` error code, use\nthe following query: \n\n```\nfetch nat_gateway\n metric 'router.googleapis.com/nat/nat_allocation_failed'\n group_by 1m,\n [value_nat_allocation_failed_count_true:\n count_true(value.nat_allocation_failed)]\n every 1m\n```\n\nIf you identify packets that dropped for this reason, see\n[Need to allocate more IP addresses](/nat/docs/troubleshooting#allocate-more-IPs).\n\n### Investigate Cloud NAT configuration\n\nIf the previous queries return empty results, and GKE Pods are\nunable to communicate to external IP addresses, use the following table to\nhelp you troubleshoot your configuration:\n\nReduce packet loss\n------------------\n\nAfter you have diagnosed the cause of your packet loss, consider using the\nfollowing recommendations to reduce the likelihood of the issue from recurring\nin the future:\n\n- Configure the Cloud NAT gateway to use\n [dynamic port allocation](/nat/docs/ports-and-addresses#dynamic-port) and\n [increase the maximum number of ports per VM](/nat/docs/tune-nat-configuration#change-dynamic-port).\n\n- If you're using\n [static port allocation](/nat/docs/ports-and-addresses#static-port),\n [increase the number of minimum ports per VM](/nat/docs/tune-nat-configuration#change-min-port).\n\n- Reduce your application's outbound packet rate. When an application makes\n multiple outbound connections to the same destination IP address and port,\n it can quickly consume all connections Cloud NAT can make to that\n destination using the number of allocated NAT source addresses and source\n port tuples.\n\n For details about how Cloud NAT uses NAT source addresses and\n source ports to make connections, including limits on the number of\n simultaneous connections to a destination, refer to [Ports and\n connections](/nat/docs/ports-and-addresses#ports-and-connections).\n\n To reduce the rate of outbound connections from the application, reuse open\n connections. Common methods of reusing connections include connection\n pooling, multiplexing connections using protocols such as\n [HTTP/2](https://datatracker.ietf.org/doc/html/rfc7540), or establishing\n persistent connections reused for multiple requests. For more information,\n see [Ports and\n Connections](/nat/docs/ports-and-addresses#ports-and-connections).\n\nWhat's next\n-----------\n\n- If you can't find a solution to your problem in the documentation, see\n [Get support](/kubernetes-engine/docs/getting-support) for further help,\n including advice on the following topics:\n\n - Opening a support case by contacting [Cloud Customer Care](/support-hub).\n - Getting support from the community by [asking questions on StackOverflow](http://stackoverflow.com/questions/tagged/google-kubernetes-engine) and using the `google-kubernetes-engine` tag to search for similar issues. You can also join the [`#kubernetes-engine` Slack channel](https://googlecloud-community.slack.com/messages/C0B9GKTKJ/) for more community support.\n - Opening bugs or feature requests by using the [public issue tracker](/support/docs/issue-trackers)."]]