Troubleshoot configuration

This guide can help you solve common issues with Cloud NAT.

Common issues

VMs can reach the internet unexpectedly, without Cloud NAT

If your virtual machine (VM) instances or container instances can reach the internet without Cloud NAT, but you don't want them to, check for the following issues:

Determine if the VM's network interface has an external IP address. If the network interface has an external IP address assigned to it, Google Cloud automatically performs one-to-one NAT for packets whose sources match the interface's primary internal IP address. For more information, see Cloud NAT specifications.

To determine if a VM has an external IP address, see changing or assigning an external IP address to an existing instance.
Ensure that your Google Kubernetes Engine (GKE) cluster is a private cluster. Each node VM in a non-private cluster has an external IP address, so each node can use routes in your Virtual Private Cloud (VPC) network whose next hop is the default internet gateway without relying on Cloud NAT. For more information, including how non-private clusters interact with Cloud NAT gateways, see Compute Engine interaction.
List routes in your Virtual Private Cloud network, looking for ones that could provide internet connectivity through a next hop different than the default internet gateway. As examples:
- Static routes whose next hops are VMs, internal passthrough Network Load Balancers, or Cloud VPN tunnels might indirectly provide internet connectivity. For example, the next hop VMs or backend VMs for an internal passthrough Network Load Balancer might have external IP addresses themselves, or a Cloud VPN tunnel might connect to a network that offers internet access.
- Dynamic routes learned from on-premises networks by Cloud Routers in your VPC network might connect to a network that offers internet access.
Keep in mind that other custom routes in your VPC network might have higher priorities than routes whose next hops are default internet gateways. For information about how Google Cloud evaluates routes, see routing applicability and order.

No logs are generated

Verify that NAT logging is enabled.
Double-check that your view of the logs isn't filtering out the logs that you are looking for. For instructions, see Viewing logs.
Make sure that a firewall rule isn't blocking traffic. Firewall rules that block egress (outbound) traffic are applied before the traffic would have been sent to the NAT gateway. You can use Firewall Rules Logging to see if your custom egress rules are blocking outbound traffic.
Review Types of Cloud NAT. The destination for your traffic might not be handled by NAT.

Certain logs are excluded

Verify that NAT logging is enabled and that your log filter is not excluding logs that you want to keep. You can clear a logs filter so that nothing is excluded.
Cloud NAT does not log every single event. During periods of heavy egress traffic, NAT logging is throttled, proportional to the machine type of the VM. Translation or error logs might be dropped, and it is not possible to determine what is omitted during throttling.

Packets dropped with reason: out of resources

If you see packet loss from VMs that use Cloud NAT, this might be because there are not enough available NAT source IP address and source port tuples for the VM to use at the time of the packet loss (port exhaustion). A five-tuple (NAT source IP address, source port, and destination 3-tuple) cannot be reused within the TCP TIME_WAIT timeout.

If there aren't enough available NAT tuples, the dropped_sent_packets_count reason is OUT_OF_RESOURCES. For more information about metrics, see Using VM instance metrics.

See Reduce your port usage for ways to reduce port usage.

If you use dynamic port allocation, see the following section for ways to reduce packet drops when dynamic port allocation is used.

Packets dropped when dynamic port allocation is configured

Dynamic port allocation detects when a VM is close to being out of ports, and doubles the number of ports that are allocated to the VM. This helps ensure that ports aren't wasted, but can result in dropped packets while the number of allocated ports is increasing.

To reduce the number of dropped packets, consider the following:

If you can ramp up connections more slowly, Cloud NAT has more time to allocate more ports.
If VMs are making TCP connections, you can configure the VMs with a larger value for tcp_syn_retries, which gives the system more time to establish the connection and increases the chances for the connection to succeed.

For example, for Linux VMs, you can view the current setting:
```
  sysctl net.ipv4.tcp_syn_retries
  
```
If needed you can increase the setting:
```
  sudo sysctl -w net.ipv4.tcp_syn_retries=NUM
  
```
Note: Increasing the value for tcp_syn_retries can lead to latency because the system keeps dropping packets until the connection succeeds after multiple retries.
If you have bursty workloads and need to quickly allocate more ports, you might need to adjust the minimum number of ports per VM. View your port usage and determine an appropriate minimum number of ports per VM.

Packets dropped with reason: endpoint independence conflict

If you see packet loss from VMs that use Public NAT, and you have Endpoint-Independent Mapping turned on, the packet loss might be caused by an endpoint independent conflict. If it is, the dropped_sent_packets_count reason is ENDPOINT_INDEPENDENCE_CONFLICT. For more information about metrics, see Using VM instance metrics.

You can reduce the chances of endpoint independent conflicts by using the following techniques:

Turn off Endpoint-Independent Mapping. This allows the new connection from a given source IP address and port to use a different NAT source IP address and port than it used before. Disabling or enabling Endpoint-Independent Mapping does not interrupt established connections.
Increase the minimum default number of NAT ports per VM instance, so that the port reservation procedure can assign more NAT source IP address and source port tuples to each client VM. This decreases the probability that two or more client IP address and ephemeral source port tuples are assigned the same NAT source IP address and source port tuple.

Check how many ephemeral source ports are being used:

For Linux VMs:

netstat -an | egrep 'ESTABLISHED|TIME_WAIT|CLOSE_WAIT' | wc -l

For Windows VMs:

netstat -tan | findstr "ESTABLISHED TIME_WAIT CLOSE_WAIT" | find /c /v ""

Configure your VM instances to use a larger set of ephemeral source ports:
- For Linux VMs:
  - You can view what port range is configured with this command:
```
cat /proc/sys/net/ipv4/ip_local_port_range
```
  - You can set the ip_local_port_range to the maximum number of ephemeral source ports (64,512) with this command:
```
echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range
```
- For Windows VMs:
  - You can view what port ranges are configured with these commands:
```
netsh int ipv4 show dynamicport tcp
netsh int ipv4 show dynamicport udp
```
  - You can set the number of ephemeral source TCP and UDP ports to the maximum possible (64,512) with these commands:
```
netsh int ipv4 set dynamicport tcp start=1024 num=64512
netsh int ipv4 set dynamicport udp start=1024 num=64512
```
  - On Google Kubernetes Engine nodes, you can automate this configuration by using a privileged DaemonSet.
For GKE clusters, disable the source NAT performed on each node for packets sent to destinations of interest. You can do this in one of two ways:
- By deploying the ip-masq-agent and adding the destinations of interest to the list of nonMasqueradeCIDRs.
- By disabling SNAT for the default non-masquerade destinations with the --disable-default-snat flag when you create a cluster.

Dropped received packets

A Cloud NAT gateway maintains a connection tracking table to store active connection details and IP address and port mappings—how VM IP addresses and ports translate to NAT IP addresses and ports. A Cloud NAT gateway drops an ingress data packet if the connection tracking table doesn't contain any entry for the connection.

The absence of the connection entry in the table can be due to any of the following reasons:

An established TCP connection timed out because the TCP Established Connection Idle Timeout expired due to inactivity.
An external endpoint fails to establish a new connection before the TCP Transitory Connection Idle Timeout expired. For example, a Google Cloud resource initiates a connection with TCP SYN, but the external endpoint fails to respond with a SYN ACK.
An external endpoint, such as a prober, tries to connect to a NAT IP address and port. Cloud NAT doesn't accept unsolicited inbound connections. Entries for these type of connections won't be present in the connection table. So, any received packets will be dropped.
If you remove NAT IPs from your gateway while NAT connections are still active, then the NAT Mappings become invalid, and these connections are immediately removed from the connection tracking table—any return traffic is dropped.

Before you address the ingress packet drops, confirm if the drops actually impact your application. To confirm, check your application for errors whenever spikes in dropped ingress packets occur.

If the ingress packet drops do impact your application, try using the following techniques to address the issue:

Use keepalive mechanisms in your application, so that long-running connections can stay open for a longer period.
Increase the value for TCP Transitory Connection Idle Timeout, so that external endpoints that receive traffic (initiated by Google Cloud resources) through a Cloud NAT gateway get more time to respond and to establish the connection.
Increase the value for TCP Established Connection Idle Timeout if you have significantly decreased the default value.

Need to allocate more IP addresses

Sometimes your VMs are unable to reach the internet because you don't have enough NAT IP addresses. Multiple factors can cause this problem. For more information, see the following table.

Root cause Symptom Solution

You've manually allocated addresses, but you haven't allocated enough of them, given your current port usage.

Root cause	Symptom	Solution
You've manually allocated addresses, but you haven't allocated enough of them, given your current port usage.	The Google Cloud console displays an error that reads You need to allocate at least 'X' more IP addresses to allow all instances to access the internet. The value of the `nat_allocation_failed` metric is `true`.	Do one of the following: Minimize your port usage, as described in Reduce your port usage. Manually add more IP addresses, as described in Update external IP addresses associated with NAT.
You have surpassed a hard limit for NAT IP addresses.	The value of the `nat_allocation_failed` metric is `true`.	Minimize your port usage, as described in Reduce your port usage.

The Google Cloud console displays an error that reads You need to allocate at least 'X' more IP addresses to allow all instances to access the internet.
The value of the nat_allocation_failed metric is true.

Do one of the following:

Minimize your port usage, as described in Reduce your port usage.
Manually add more IP addresses, as described in Update external IP addresses associated with NAT.

You have surpassed a hard limit for NAT IP addresses.

The value of the nat_allocation_failed metric is true.

Minimize your port usage, as described in Reduce your port usage.

To monitor failures caused by an insufficient number of IP addresses, create an alert for the nat_allocation_failed metric. This metric is set to true if Google Cloud is unable to allocate sufficient IP addresses for any VM in your NAT gateway. For information about alert policies, see Defining alerting policies.

Reduce your port usage

You can minimize the number of ports that each VM uses in situations where allocating more NAT IP addresses is not possible or desirable.

To reduce port usage, complete the following steps:

Disable Endpoint-Independent Mapping.
Enable dynamic port allocation. To use dynamic port allocation, you set a minimum number of ports per VM and a maximum number of ports per VM. Cloud NAT automatically allocates a number of NAT source IP address and source port tuples between the minimum and maximum number of ports, inclusive. Using a low number for the minimum number of ports reduces wasting NAT source IP address and source port tuples on VMs with fewer active connections. If you encounter connection timeouts while ports are being allocated, see Reduce packet drops with dynamic port allocation.
Determine the lowest possible minimum number of ports to meet your needs. There are several methods to do this, and most rely on reviewing the number of used ports (compute.googleapis.com/nat/port_usage) as input to the decision-making process. For information about how to find port usage, see View port usage. The following are two example methods to determine a minimum number of ports:
- Consider the average value of compute.googleapis.com/nat/port_usage over a representative time period for a representative number of VMs.
- Consider the most frequently occurring value of compute.googleapis.com/nat/port_usage over a representative time period for a representative number of VMs.
Determine the lowest possible maximum number of ports to meet your needs. Once again, review compute.googleapis.com/nat/port_usage as input to your decision-making process. Consider the maximum value of compute.googleapis.com/nat/port_usage over a representative time period for a representative number of VMs as a starting point for the maximum number of ports. Keep in mind that setting the maximum number too high can prevent other VMs from receiving NAT source IP address and source port tuples.
Finding the right values for minimum and maximum ports involves iterative testing. For steps to change minimum and maximum port numbers, see Change minimum or maximum ports when dynamic port allocation is configured.
Review the NAT timeouts, their meanings, and their default values. If you need to rapidly create a series of TCP connections to the same destination 3-tuple, consider reducing the TCP time wait so that Cloud NAT can more quickly re-use NAT source IP address and source port tuples. This allows Cloud NAT to more quickly use the same 5-tuple instead of needing to use a unique 5-tuple, which might require allocation of additional NAT source IP address and source port tuples for each sending VM. For steps to change NAT timeouts, see Change NAT timeouts.

Frequently asked questions

Regional restriction for Cloud NAT

Can I use the same Cloud NAT gateway in more than one region?

No. A Cloud NAT gateway cannot be associated with more than one region, VPC network, or Cloud Router.

If you need to provide connectivity for other regions or VPC networks, create additional Cloud NAT gateways for them.

Are the external NAT IP addresses used by Cloud NAT gateways global or regional?

Cloud NAT gateways use regional external IP addresses as NAT IP addresses. Even though they are regional, they are publicly routable. For information about different ways that NAT IP addresses can be allocated or assigned, see NAT IP addresses.

When Cloud NAT can and cannot be used

Does Cloud NAT apply to instances, including GKE node VMs, that have external IP addresses?

Generally, no. If the network interface of a VM has an external IP address, Google Cloud always performs 1-to-1 NAT for packets sent from the primary internal IP address of the network interface without using Cloud NAT. However, Cloud NAT could still provide NAT services to packets sent from alias IP address ranges of that same network interface. For additional details, see Cloud NAT specifications and Compute Engine interaction.

Does Public NAT let a source VM whose network interface lacks an external IP address send traffic to a destination VM or load balancer that has an external IP address, even when the source and destination are in the same VPC network?

Yes. The network path involves sending traffic out of the VPC network through a default internet gateway, and then receiving it in the same network.

When the source VM sends a packet to the destination, Public NAT performs source NAT (SNAT) before delivering the packet to the second instance. Public NAT performs destination NAT (DNAT) for responses from the second instance to the first. For a step-by-step example, see Basic Public NAT configuration and workflow.

Can I use Private NAT for communication between VMs in the same VPC network?

No, Private NAT doesn't perform NAT on traffic between VMs in the same VPC network.

Unsolicited incoming connections not supported

Does Cloud NAT allow for inbound connections (for example, SSH) to instances without external IP addresses?

No, Cloud NAT does not support unsolicited incoming connections. For more information, see Cloud NAT specifications. However, Google Cloud's network edge might respond to pings if the destination IP address is a Cloud NAT gateway external IP address that has active port mappings to at least one VM instance. To see IP addresses assigned to a Cloud NAT gateway, use the gcloud compute routers get-nat-ip-info command. External IP addresses marked as IN_USE might respond to pings.

If you need to connect to a VM that doesn't have an external IP address, see Choose a connection option for internal-only VMs. For example, as part of the Cloud NAT example Compute Engine setup, you connect to a VM without an external IP address by using Identity-Aware Proxy.

Cloud NAT and ports

Why does a VM have a fixed number of ports (`64` by default)?

When a Cloud NAT gateway provides NAT for a VM, it reserves source address and source port tuples according to the port reservation procedure.

For more information, see port reservation examples.

Can I change the minimum number of ports reserved for a VM?

Yes. You can increase or decrease the minimum number of ports per VM when you create a new Cloud NAT gateway or by editing it later. Each Cloud NAT gateway reserves source address and source port tuples according to the port reservation procedure.

For additional information about decreasing the minimum number of ports, see the next question.

Can I decrease the minimum number of ports per VM after creating the Cloud NAT gateway?

Yes; however, decreasing the minimum number of ports could result in the port reservation procedure reserving a smaller number of ports per VM. When this happens, existing TCP connections might be reset and, if so, must be re-established.

When switching NAT mapping from Primary and Secondary ranges to Primary range only, are additional ports allocated to each instance immediately released?

No. Any additional ports used by secondary ranges are retained by instances until the minimum ports per VM setting is reduced. When Cloud NAT is configured to map Secondary (alias) ranges for subnets, Cloud NAT assigns a minimum of 1,024 ports per instance, based on the port reservation procedure.

By switching to Primary ranges only, Cloud NAT conserves those additional allocated ports for instances that have already had those ports assigned. After changing the ranges for which Cloud NAT is applied to Primary only, the actual number of ports assigned to those instances is not changed until the minimum ports per VM setting is also reduced.

To reduce the amount of ports allocated to those instances, after switching to primary ranges, the minimum ports per VM setting must be reduced. After that value is reduced, Cloud NAT automatically adjusts the number of ports allocated per instance down, which reduces port consumption.

Cloud NAT and other Google services

Does Cloud NAT enable access to Google APIs and services?

When you enable Cloud NAT for a subnet's primary IP range, Google Cloud automatically enables Private Google Access. For more information, see Private Google Access interaction.

What's next

Get support