Cloud NAT is dropping or limiting egress connectivity

Problem

Cloud NAT is dropping or limiting egress connectivity.

Environment

Private Google Compute Engine instances using Cloud NAT for outbound access.

Solution

Depending on the root cause, the following can resolve the issue:

  • Assigning more NAT IP addresses to the Cloud NAT gateway,
  • Increasing the Minimum ports per VM instance setting of the Cloud NAT gateway,
  • Disable Endpoint-Independent Mapping, which is enabled by default.

Cause

Cloud NAT is dropping outbound packets due to port exhaustion.

Search Cloud Logging with the following filter to verify the existence of dropped packets:

resource.type="nat_gateway"

jsonPayload.allocation_status="DROPPED"

Through Metrics Explorer, set the following Metric query: Resource: Cloud NAT Gateway, Metric: Sent packets dropped count. Inspecting the reason field, we may find that packets are dropped because of one of the following reasons or a combination:

  • OUT_OF_RESOURCES, which means one of the following, or a combination:
    • The Cloud NAT gateway has run out of ports to allocate to the instances it serves. You can confirm this through Metrics Explorer, by setting the following Metric query: Resource: Cloud NAT Gateway, Metric: NAT Allocation failed. If the metric indicates that there have been failed allocations, then more NAT IPs need to be added to the Cloud NAT gateway. You can calculate the number of NAT IPs required by your set-up based on the Port reservation procedure. Please note that this will not be a problem if you have selected Automatic NAT IP allocation.
    • The ports assigned to an instance are not enough to serve all its simultaneous connections. Cloud NAT is a cone NAT, meaning requests from the same internal IP will only be allowed as many connections as ports available. This makes it easy to surpass the Minimum ports per VM instance, especially in the context of short-lived connections. In this case, the Minimum ports per VM instance setting of the Cloud NAT gateway needs to be increased.
  • ENDPOINT_INDEPENDENCE_CONFLICT, which reduces the number of simultaneous connections from a client VM to the same destination 3-tuple, even when there is a sufficient number of free NAT source IP address and source port tuples for the client VM; thus, the VM fails to allocate ports to new connections to this client. You can learn more about Endpoint-Independent Mapping conflicts and see an example. Since Endpoint-Independent Mapping is enabled by default in Cloud NAT, disabling it shall resolve the issue.