Management Tools

Introducing Pub/Sub as a new notification channel in Cloud Monitoring

Around the world, operations teams are working to automate their monitoring and alerting workflows, looking to reduce the time they spend on rote operational work (what we call “toil”), so they can spend more time on valuable work. For instance, Google’s Site Reliability Engineering organization aims to keep toil below 50% of an SRE’s time, freeing them up to work on more impactful engineering projects.

Our Cloud Monitoring service can alert you through a variety of notification channels. One of these channels, webhook, can be used for building automation into your systems with programmatic ways to respond to errors or anomalies in your applications.

Today, we’re announcing a new notification channel type via Cloud Pub/Sub. Now in beta, this integration lets you create automated and programmatic workflows in response to alerts. Pub/Sub is a publish/subscribe queue that lets you build loosely coupled applications. Using Pub/Sub as your notification channel makes it easy to integrate with third-party providers for alerting and incident response workflows, you can use the Pub/Sub as a notification channel through the API and the Google Cloud Console.
pub:sub.gif

Pub/Sub vs webhook

It’s true that you can also use a webhook to integrate with third-party packages, so when should you use Pub/Sub rather than a webhook? Both of these methods can be used for different scenarios. The main differences between Pub/Sub and a webhook center around notions of implicit vs. explicit invocation, durability and authentication methods.

When to use a webhook

Webhooks are aimed at explicit invocation where the publisher (client) retains full control of the webhook’s execution. This means that the execution timing and the processing of the alert message are the server’s responsibility. Moveover, although a webhook does retry deliveries a few times upon failure, if the target endpoint is unavailable for too long, notifications are dropped entirely. Finally, this method supports only basic and token authentication. 

An example use case for a webhook is when you already have a central Incident Response Management (IRM) solution in place. The web server exposes endpoint URLs that third-party alerting solutions can invoke. Once the client invokes the webhook, the server receives the request, parses and processes the message and can create an incident, update or resolve it. And because the server is responsible for parsing the messages, multiple third parties, each sending a different JSON payload, can use a single endpoint. Another option is to expose separate endpoints for each third party or message format.

Since different third parties can use this webhook, you can use basic or token-based authentication to authenticate the caller. And because you maintain and manage the server, you can ensure that the server is always available to receive incoming messages.

when to use webhooks.jpg

When to use Pub/Sub

Pub/Sub supports both explicit (push) and implicit (pull) invocation. In pull mode, the subscriber has control over when to pull the message from the queue and how to process an alert message. Pub/Sub provides a durable queue in which messages wait as long as necessary until the subscriber pulls the message. 

Pub/Sub is an access-controlled queue, with access managed by Cloud Identity and Access Management, meaning that the subscriber needs to be able to authenticate using a user or service account. Messages delivered to Pub/Sub never leave the Google network.

An example of where to use Pub/Sub is if an alert message needs to be transformed before it is sent to be processed. Consider this scenario: an uptime check that you configured to check the health of your load balancers is failing. As a result, an alert is fired and a message is published to your Pub/Sub channel. A Cloud Function is triggered as soon as a new message hits the Pub/Sub topic. The function reads the message and identifies the failing load balancer. The function then sends a command to change the DNS record to point to a failover load balancer.

pub_sub.jpg

We built our Pub/Sub notification channels capabilities to give you flexibility when defining your alerting notifications and to help reduce toil. You can get started with Pub/Sub notification channels using this guide. Also, be sure to join the discussion on our mailing list.