Troubleshooting
Use the following guide to troubleshoot common issues with Cloud Router:
- Configuration issues related to configuring and establishing BGP sessions.
- Cloud Router issues related to the Cloud Router itself.
- Route processing issues related to route propagation or route priorities.
For additional help, see the following documentation:
For information about BGP session states, see BGP session states.
For information about diagnostic messages and session states related to Bidirectional Forwarding Detection (BFD), see BFD diagnostic messages and session states.
For issues related to using Cloud Router with Router appliance, see Troubleshooting Router appliance in the Network Connectivity Center documentation.
Configuration issues
BGP session failed to establish
Check that the settings on your on-premises BGP router and the settings on your Cloud Router are correct. For detailed information, view the Cloud Router logs.
If you're creating a Cloud VPN tunnel, check that the
status of the tunnel is
ESTABLISHED
. If it isn't, to troubleshoot the issue, see
Cloud VPN troubleshooting.
IP addresses for BGP sessions
The IP addresses that you can use for a BGP session depend on which network connectivity product you use. For complete details, see BGP IP addresses.
Invalid value for the field resource.bgp.asn
You may get the following error:
"Invalid value for field resource.bgp.asn
: ######
.
Local ASN conflicts with peer ASN specified by a router in the same region
and network."
The Cloud Router is attempting to establish a BGP session with an on-premises device that has the same ASN as the Cloud Router. To resolve this issue, change the ASN of your device or Cloud Router.
iBGP between Cloud Routers in a single region doesn't work
Although you can create two Cloud Routers with the same ASN, iBGP isn't supported.
Cloud Router issues
BGP resets that originate from Google Cloud appear on your router
Cloud Router tasks are software processes in the Google Cloud control plane that are normally migrated from machine to machine. During such migrations, the Cloud Router might be down for periods of up to 60 seconds. Normal migrations do not cause traffic to be dropped.
The Cloud Router is not located in the data path and is not acting as a Layer 3 switch, but as a manager for route programming. Routing is actually handled by the VLAN attachment or the Cloud VPN tunnel.
NOTIFICATION_RECEIVED
message appears in Cloud Router logs
A NOTIFICATION_RECEIVED
message appears in the Cloud Router logs when the Cloud Router
has received a NOTIFICATION
message from the BGP peer. The NOTIFICATION
message is a signal
to Cloud Router that it should stop the BGP session.
When the Cloud Router receives a NOTIFICATION
message from its BGP peer,
Cloud Router closes the BGP connection with that peer and removes all its learned routes.
The BGP peer can send NOTIFICATION
messages for a variety of reasons.
For example, the peer might send a "Hold Timer Expired" message.
CONFIG_DISABLED
message appears in Cloud Router logs
A CONFIG_DISABLED
message indicates that Cloud Router
has intentionally stopped the BGP session. By stopping the BGP session,
Cloud Router is attempting to immediately communicate an error state to its peer.
This message can appear due to any of the following reasons:
- A user has disabled the BGP session by using the Cloud Router API, Google Cloud console, or the Google Cloud CLI. See Disabling or removing BGP sessions.
- For a BGP session set up for Cloud VPN, the VPN tunnel associated with the BGP session has not established an IKE and child security association (SA). To troubleshoot VPN connectivity, see Cloud VPN troubleshooting.
- For a BGP session set up for Cloud Interconnect, the VLAN attachment is not configured or is in an admin-down state. To troubleshoot the issue further, see Troubleshooting in the Cloud Interconnect documentation.
- For a BFD-enabled BGP session, the BFD control detection timer on the Cloud Router has expired. When this occurs, the BGP session is stopped. For more information on BFD session states, see BFD diagnostic messages and session states.
LINK_DOWN
message appears in Cloud Router logs
A LINK_DOWN
message appears in the Cloud Router logs when the
link between the Google peering edge router and your VLAN attachment for
Cloud Interconnect is down. The peering edge router is networking equipment that is managed
by Google inside the colocation facility where you have provisioned your
Cloud Interconnect connection.
The LINK_DOWN
message is a signal that the corresponding BGP peer status is
down. This message applies only to Cloud Interconnect-based BGP sessions.
On-premises router experiences BGP flap
BGP flaps can be caused by various issues, including Cloud Router software maintenance and automated task restarts.
To get details about completed maintenance events, see Identifying router maintenance events. To get details about other Cloud Router events, see Viewing Cloud Router logs and metrics.
A Cloud Router maintenance event is not indicative of a problem if your on-premises router is configured as follows:
- The on-premises router can process graceful restart notifications.
- The on-premises router's hold timer is set to at least 60 seconds.
For a comprehensive overview of timer settings, see Managing BGP timers.
For help monitoring connectivity, see Verify connectivity between the on-premises router and the Cloud Router.
Authentication issues
The following sections describe problems that can occur with MD5 authentication.
BGP peer status is MD5_AUTH_INTERNAL_PROBLEM
Sometimes the status of a BGP peer includes the following values:
md5AuthEnabled
:true
statusReason
:MD5_AUTH_INTERNAL_PROBLEM
The first value indicates that you have successfully configured MD5 authentication.
However, the second value—a statusReason
value of
MD5_AUTH_INTERNAL_PROBLEM
—indicates that an internal error has
prevented Cloud Router from being able to configure MD5 authentication. For
that reason, the BGP session status is DOWN
. In this case, you do not need to
do anything. Cloud Router tries to recover and bring the session back up. If
the session is taking more than one hour to back up, contact Google Cloud Support.
For information about how to check the peer's status, see Check authentication status.
Cloud Router and peer use different MD5 keys
When you set up MD5 authentication, the Cloud Router and its peer router must use the same secret authentication key. If a mismatch occurs, the two routers cannot communicate. If you think that there's been a mismatch, one solution is to update the key that is used by the Cloud Router. For information about how to make this change, see Update the authentication key.
If you're not sure whether there's been a key mismatch, look for troubleshooting solutions in your peer router's documentation. Many routers have logs that record whether or not there's been a key mismatch.
Auto generated MD5 key is longer than on-premises device can support
You can auto generate the MD5 key by clicking Generate and Copy in the UI console. For more information, see Add authentication to an existing session. If the auto generated MD5 key is longer than your on-premises can support, you can configure MD5 key manually through UI or gcloud or API.
Route processing issues
On-premises routes without a MED value are taking priority
If the Cloud Router receives an on-premises route that doesn't have a
MED value, the Cloud Router follows the behavior described in
RFC 4271.
The Cloud Router treats the route with the highest priority by assuming
the lowest possible MED value (0
).
You can't send and learn MED values over a Layer 3 Partner Interconnect connection
If you are using a Partner Interconnect connection where a Layer 3 service provider handles BGP for you, Cloud Router can't learn MED values from your on-premises router or send MED values to that router. This is because MED values can't pass through autonomous systems. Over this type of connection, you can't set route priorities for routes advertised by Cloud Router to your on-premises router. In addition, you can't set route priorities for routes advertised by your on-premises router to your VPC network.
Some on-premises IP prefixes aren't available
If some on-premises IP prefixes aren't available, check quotas and limits or overlapping subnet ranges.
Custom learned routes are inactive
If you have configured a custom learned route but are experiencing traffic loss, ping errors, or other problems related to the route, do the following:
Make sure that the route is configured properly on the BGP session.
Make sure that the BGP session is up.
For more information, see Check the status of custom learned routes.
Check quotas and limits
Check that your Cloud Routers haven't exceeded the limits for learned routes. To view the number of learned routes for a Cloud Router, view its status.
For information about the limits, related log messages, and metrics, and how to resolve issues, see the following table.
Limits | Guidance |
---|---|
About the limits | There are two limits for learned routes. These limits don't directly define a maximum number of learned routes. Instead, they define the maximum number of unique destination prefixes:
The first limit is relevant regardless of the dynamic routing mode used by the VPC network. The second limit only makes sense if the VPC network uses global dynamic routing mode. For details about Cloud Router limits, see Limits. |
Logs | When you encounter either of these limits, you'll see a
limit-exceeded message in Cloud Logging. For
information about how to create an advanced query to view this message,
see the related
query
in the logging documentation for Cloud Router. |
Metrics | You can also use the following metrics to understand your current
limits and usage. These metrics are prepended with
These metrics are available through the |
Resolving issues | You can do the following to resolve route limit issues. In situations where the number of routes exceeds the limits by a large amount, it makes sense to do both:
|
Check overlapping subnet ranges
Ensure that the IP address ranges for a VPC subnet don't fully overlap with advertised routes from your on-premises network. Overlapping IP ranges can cause routes to be dropped. This also applies to custom static routes that overlap with a dynamic route learned by a Cloud Router. Prefixes received by Cloud Routers are ignored (custom dynamic routes are not created) in the following scenarios:
- When the prefix learned exactly matches a primary or secondary IP address range of a subnet in your VPC network.
- When the prefix learned exactly matches the destination of a custom static route in your VPC network.
- When the prefix learned is more specific (has a longer subnet mask) than a primary or secondary IP address range of a subnet in your VPC network.
- When the prefix learned is more specific (has a longer subnet mask) than the destination of a custom static route in your VPC network.
For more information, see Applicability and order of routes in the VPC Routes overview.
Routes learned from an on-premises network aren't propagating to other VPC networks
A single Cloud Router can't re-advertise routes learned from one BGP peer to other BGP peers, including to Cloud Routers in other VPC networks.
For example, in the following hub and spoke topology, Cloud Router cannot support route advertisement between multiple VPC networks.
To review recommendations for network topologies in Google Cloud, see Best practices and reference architectures for VPC design.
In addition, to build and manage hub and spoke topologies in Google Cloud, you can use Network Connectivity Center.
Prefixes aren't getting imported into BGP sessions (AS path prepending)
AS path prepending is irrelevant to the control plane and VPC network. AS path length is only considered within each Cloud Router software task as described in the following scenarios.
If a single Cloud Router software task learns the same destination from two or more BGP sessions:
- The software task picks a next hop BGP session that has the shortest AS path length.
- The software task submits destination, next hop, and MED information to the Cloud Router control plane.
- The control plane uses the information to create one or more candidate routes. Each candidate's base priority is set to the MED received.
If two or more Cloud Router software tasks learn the same destination from two or more BGP sessions:
- Each software task picks a next hop BGP session that has the shortest AS path length.
- Each software task submits destination, next hop, and MED information to the Cloud Router control plane.
- The control plane uses the information to create two or more candidate routes. Each candidate's base priority is set to the MED received.
The Cloud Router control plane then installs one or more dynamic routes in the VPC network, according to the VPC network's dynamic routing mode. In global dynamic routing mode, the priority of each regional dynamic route is adjusted in regions different from the Cloud Router region. For details about how Google Cloud selects a route, see Routing order in the VPC documentation.
On a multi-NIC VM, each NIC gets different routes
This is the expected behavior. You must configure each network interface controller (NIC) for a multi-NIC VM in a unique VPC network. Each Cloud Router creates custom dynamic routes in one VPC network. Thus, the routes learned by one Cloud Router are only applicable to one network interface of a multi-NIC VM. Packets sent from a VM's network interface use only the routes applicable to the VPC network for that interface.
Traffic is being routed asymmetrically
Traffic is routed asymmetrically when ingress and egress traffic use different paths. For example, you might have two Cloud VPN tunnels. Egress traffic from your VPC network might use the first tunnel, while ingress traffic into your VPC network might use the second tunnel.
Asymmetric routing happens when the preferred path advertised by your on-premises router and the Cloud Router don't align. For ingress traffic into your VPC network, use the Cloud Router to configure advertised route priorities. For more information, see Learned routes.
Check your device documentation for how the BGP best path selection works because other attributes (such as router ID or origin ASN) can affect it. For example, see the following resources:
- Cisco: BGP Best Path Selection
- Fortinet: BGP Handbook
- Juniper: BGP Path Selection
For egress traffic out of your VPC network, check your on-premises router's MED value.
The default route (0.0.0.0/0
or ::/0
) is sending traffic to the internet gateway
When you create a VPC network, Google Cloud automatically
creates a default route with a
priority of 1000
whose next hop is the default internet gateway.
Routes with a next hop of the default internet gateway can only be used by VMs that meet internet access requirements.
Using routes with a next hop of the default internet gateway is also required to access Google APIs and services—for example, when using Private Google Access.
The following examples describe situations that can cause traffic to the internet or to Google APIs and services to be blocked:
- If you delete the automatically created default route (the route with a next hop of the default internet gateway).
- If you replace the automatically created default route, and the next hop of the replacement route is different from the default internet gateway.
- If a Cloud Router learns a route with destination
0.0.0.0/0
or::/0
that has a higher priority than the automatically created default route.
The next hop isn't clear
To learn how Google Cloud's route selection algorithm works, see Applicability and order in the VPC Routes documentation.
IPv6 traffic is not being routed
If you are experiencing difficulty connecting to IPv6 hosts, do the following:
- Verify that IPv4 routes are being correctly advertised. If IPv4 routes are not being advertised, perform the general troubleshooting procedures listed in this document.
- Inspect firewall rules to ensure that you are allowing IPv6 traffic between your VPC network and your on-premises network.
- Verify that you do not have overlapping IPv6 subnet ranges in your VPC network and your on-premises network. See Check overlapping subnet ranges.
- Determine whether you have exceeded any quotas and limits for your learned routes. If you have exceeded your quota for learned routes, IPv6 prefixes are dropped before IPv4 prefixes. See Check quotas and limits.
- Verify that all components that require IPv6 configuration have been configured
correctly.
- The VPC subnet is configured to use the
IPV4_IPV6
stack type. - The VPC subnet has
--ipv6-access-type
set toINTERNAL
. - The Compute Engine VMs on the subnet are configured with IPv6 addresses.
- The HA VPN gateway or the VLAN attachment for
Dedicated Interconnect is configured to use the
IPV4_IPV6
stack type. - The BGP peer is enabled to use IPv6, and correct IPv6 next hop addresses are
configured for the BGP session.
- To view Cloud Router status and routes, see View Cloud Router status and routes.
- To view BGP session configuration, see View BGP session configuration.
- The VPC subnet is configured to use the
What's next
- For more information about how to use Cloud Logging to monitor Cloud Router, see View logs and metrics.
- For additional support, see Getting support.