Identifying router maintenance events

Cloud Router uses Border Gateway Protocol (BGP) to exchange routes between your Virtual Private Cloud (VPC) network and your on-premises network. Google Cloud periodically performs software maintenance and automated task restarts. During maintenance, on-premises routers typically log a BGP down event followed by a BGP up event (collectively, a BGP flap).

Identify events using the router task activated message

In the Google Cloud Console, the following message appears: Router Event: Router task activated. This message indicates that a Cloud Router task started for a specific Cloud Router and is ready to establish BGP sessions. This message appears in the logs the first time that a Cloud Router is created and subsequently every time that the Cloud Router undergoes maintenance.

The Router task activated message indicates that there was a Cloud Router event during the issue/time period. This event can be due to migrating, restarting, or upgrading a Cloud Router task. Cloud Router tasks are software processes in the Google Cloud control plane that are normally migrated from machine to machine. During these migrations, the Cloud Router might be down for a few seconds. Because these migrations are done outside of the data plane, normal migrations do not cause traffic to be dropped.

Identify events using a log-based metric

Console

  1. In the Google Cloud Console, go to the Cloud Routers page.

    Go to Cloud Routers

  2. Select the Cloud Router from the list of Cloud Routers.

  3. In the Logs column, click View.

    The default query appears in the query builder.

  4. From the default query, note the value of the router ID of the resource.labels.router_id variable.

  5. Build a new query by using the resource.labels.router_id value from the previous query as follows:

    resource.labels.router_id=ROUTER_ID
    textPayload=~"Router task activated"
    
  6. Create an alert with a notification of a maintenance event by using the previous query.

    This notification appears the first time the router is created and subsequently during every maintenance event.

    For more information about creating an alert, see Creating an alert policy on a counter metric.

Verify connectivity between the on-premises router and the Cloud Router

To ensure that the BGP flap is not because of the lost connection between the on-premises router and the Cloud Router, you can verify the connectivity by using the following methods:

  • For Cloud Router used with Cloud VPN, set up a dashboard for network/received_packets_count and network/sent_packets_count to monitor any loss in connectivity. For more information, see Viewing VPN metrics.
  • For Cloud Router used with Cloud Interconnect, set up a dashboard for network/attachment/sent_packets_count and network/attachment/received_packets_count to monitor the connectivity of VLAN attachments. For more information, see Interconnect metrics.

During Cloud Router maintenance events, the dashboard is not expected to show any gap of connectivity aligned with the events. If there is a correlation between the maintenance events and packet loss, file a ticket with Google Cloud support for further investigation.

What's next