Event types

This document describes the event types that can be shown as annotations on charts. An event is an activity, such as a reboot or a crash, that affects the operation of a system. Showing events can help you correlate data from different sources when you're troubleshooting an issue.

For each event, links to references or troubleshooting documentation are provided along with information about how to query for the event. For example, when events are identified by analyzing your logs, a query suitable for use with the Logs Explorer or with a log-based alerting policy is provided.

To add annotations to your charts, you configure the dashboard or the tab that displays the chart. For example, you can configure most dashboards listed on the Dashboards page of the Google Cloud console to show events. Similarly, you can configure some service-specific Observability tabs, such as those for Compute Engine and Google Kubernetes Engine, to show events. For configuration information, see Show events on a dashboard.

The following screenshot illustrates a chart that is showing several events that were identified by analyzing log entries, and one Service Health event:

Chart showing warning and informational event annotations.

Each annotation can list multiple events. In the previous screenshot, an event for a GKE deployment is listed.

Alert event types

This section describes the alert event types that can be shown on a dashboard.

Alert opened

Alert opened events help you correlate your charted data to when incidents opened. An alert opened event is shown when the following are true:

  • The corresponding incident was opened during the time range specified by the dashboard.
  • The corresponding incident isn't closed.

There are no annotations made for incidents that were opened outside the time range specified by the dashboard aren't shown. Similarly, an alert opened event isn't shown when the corresponding incident was opened and then closed within the time range specified by the dashboard.

The tooltip for an alert opened event includes the following:

  • Name of the alerting policy.
  • Summary information, when this information is available. For example, this information might include the threshold and the measured value.
  • The duration of the incident, and the date and time when the incident was opened.
  • Metric and resource labels. The tooltip might not show all labels.
  • A View button, which opens the Details page for the incident.

Google Kubernetes Engine event types

This section describes the Google Kubernetes Engine event types that can be shown on a dashboard.

Patched or updated GKE workload

This event type helps you troubleshoot GKE workload deployment or statefulset changes, as these events can correlate with performance regressions or other performance issues. This event type is shown when a workload is created, updated, or deleted.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=k8s_cluster protoPayload.methodName=(
    io.k8s.apps.v1.deployments.create OR io.k8s.apps.v1.deployments.patch OR
    io.k8s.apps.v1.deployments.update OR io.k8s.apps.v1.deployments.delete OR
    io.k8s.apps.v1.deployments.deletecollection OR io.k8s.apps.v1.statefulsets.create OR
    io.k8s.apps.v1.statefulsets.patch OR io.k8s.apps.v1.statefulsets.update OR
    io.k8s.apps.v1.statefulsets.delete OR io.k8s.apps.v1.statefulsets.deletecollection OR
    io.k8s.apps.v1.daemonsets.create OR io.k8s.apps.v1.daemonsets.patch OR
    io.k8s.apps.v1.daemonsets.update OR io.k8s.apps.v1.daemonsets.delete OR
    io.k8s.apps.v1.daemonsets.deletecollection
)
-protoPayload.authenticationInfo.principalEmail="system:addon-manager"
-protoPayload.request.metadata.namespace=(kube-system OR gmp-system OR gmp-public OR gke-gmp-system)

For additional information, see Overview of deploying workloads and View observability metrics.

Crash of a GKE pod

This event type helps you identify and troubleshoot GKE pod crashes. Pod crashes can be caused by memory exhaustion or an application error. This event type is shown when any of the following occur:

  • Pod status is CrashLoopBackoff
  • Pod terminates with a non-zero exit code.
  • Pod terminates with an out-of-memory condition.
  • Pod is evicted.
  • Readiness/Liveliness probe fails.

If you want to create a log-based alerting policy for this event type, then use the following query:

(
    log_id(events)
    (
        (resource.type=k8s_pod jsonPayload.reason=(BackOff OR Unhealthy OR Killing OR Evicted)) OR
        (resource.type=k8s_node jsonPayload.reason=OOMKilling)
    )
    severity=WARNING
) OR (
    log_id(cloudaudit.googleapis.com%2Factivity) resource.type=k8s_cluster
    (protoPayload.methodName=io.k8s.core.v1.pods.eviction.create OR
        (protoPayload.methodName=io.k8s.core.v1.pods.delete
        protoPayload.response.status.containerStatuses.state.terminated.exitCode:*
        -protoPayload.response.status.containerStatuses.state.terminated.exitCode=0
        )
    )
)

For troubleshooting information, see Troubleshoot: CrashLoopBackOff.

Failure to schedule a GKE pod

This event type helps you identify and troubleshoot when pods that can't be scheduled on a node. This event type is shown when pod scheduling fails for any of the following reasons:

  • Insufficient node CPU.
  • Insufficient node memory.
  • No nodes for taints or tolerations.
  • Nodes at the maximum pod limit.
  • Node pool at maximum size.

If you want to create a log-based alerting policy for this event type, then use the following query:

(
    log_id(events) resource.type=k8s_pod jsonPayload.reason=(NotTriggerScaleUp OR FailedScheduling)
) OR (
    log_id(container.googleapis.com/cluster-autoscaler-visibility)
    resource.type=k8s_cluster jsonPayload.noDecisionStatus.noScaleUp:*
)

For troubleshooting information, see Troubleshoot: Pod unschedulable.

Failure to create a GKE container

This event type helps you identify and troubleshoot failures to create a GKE container. Container creation might fail due to reasons such as failed volume mounts or failed image pulls.

If you want to create a log-based alerting policy for this event type, then use the following query:

log_id(events) resource.type=k8s_pod jsonPayload.reason=(Failed OR FailedMount) severity=WARNING

For troubleshooting information, see Troubleshoot: ImagePullBackOff and ErrImagePull.

Pod autoscaler scale up and down

This event gives you visibility into Horizontal Pod Autoscaler rescales, which increase or decrease the number of running pods for a workload. For more information, see Horizontal Pod autoscaling.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=k8s_cluster log_id(events)
jsonPayload.involvedObject.kind=HorizontalPodAutoscaler jsonPayload.reason=SuccessfulRescale

Cluster autoscaler scale up and down

This event gives you visibility into when the cluster autoscaler scales up or down the number of nodes in a node pool of your cluster. For more information, see About cluster autoscaling and Viewing cluster autoscaler events.

If you want to create a log-based alerting policy for this event type, then use the following query:

(resource.type=k8s_cluster log_id(container.googleapis.com%2Fcluster-autoscaler-visibility)
jsonPayload.decision:*)

Cluster create and delete

This event tracks GKE cluster create and deletion actions. For more information, see Create an Autopilot cluster, Creating a zonal cluster, and Deleting a cluster.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gke_cluster log_id(cloudaudit.googleapis.com%2Factivity)
protoPayload.methodName=(
    google.container.v1alpha1.ClusterManager.CreateCluster OR
    google.container.v1beta1.ClusterManager.CreateCluster OR
    google.container.v1.ClusterManager.CreateCluster OR
    google.container.v1alpha1.ClusterManager.DeleteCluster OR
    google.container.v1beta1.ClusterManager.DeleteCluster OR
    google.container.v1.ClusterManager.DeleteCluster
)
operation.first=true

Cluster update

This event tracks GKE cluster updates. Updates include automatic control plane version upgrades as well as manual upgrades and cluster configuration changes. For more information, see Manually upgrading a cluster or node pool and Standard cluster upgrades.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gke_cluster log_id(cloudaudit.googleapis.com%2Factivity)
(
    protoPayload.methodName=(
        google.container.internal.ClusterManagerInternal.PatchCluster OR
        google.container.internal.ClusterManagerInternal.UpdateClusterInternal OR
        google.container.internal.ClusterManagerInternal.UpdateCluster
    )
) OR (
    protoPayload.methodName=(
        google.container.v1beta1.ClusterManager.UpdateCluster OR
        google.container.v1.ClusterManager.UpdateCluster
    )
    operation.first=true
)
protoPayload.metadata.operationType=(UPGRADE_MASTER OR REPAIR_CLUSTER OR UPDATE_CLUSTER)

Node pool update

This event tracks GKE node pool updates. Updates include automatic node pool version upgrades as well as manual upgrades, configuration changes, and resizes. For more information, see Manually upgrading a cluster or node pool and Standard cluster upgrades.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gke_nodepool log_id(cloudaudit.googleapis.com%2Factivity)
(
    protoPayload.methodName=(
        google.container.internal.ClusterManagerInternal.UpdateClusterInternal OR
        google.container.internal.ClusterManagerInternal.RepairNodePool
    )
) OR (
    protoPayload.methodName=(
        google.container.v1beta1.ClusterManager.UpdateNodePool OR
        google.container.v1.ClusterManager.UpdateNodePool OR
        google.container.v1beta1.ClusterManager.SetNodePoolSize OR
        google.container.v1.ClusterManager.SetNodePoolSize OR
        google.container.v1beta1.ClusterManager.SetNodePoolManagement OR
        google.container.v1.ClusterManager.SetNodePoolManagement OR
        google.container.v1beta1.ClusterManager.SetNodePoolAutoscaling OR
        google.container.v1.ClusterManager.SetNodePoolAutoscaling
    )
    operation.first=true
)

Cloud Run event types

This section describes the Cloud Run event types that can be shown on a dashboard.

Cloud Run deployment

This event type helps you identify and troubleshoot Cloud Run deployment failures. Deployment might fail due to reasons such as deleted service account, incorrect permissions, the import of a container failed, or a container failed to start.

If you want to create a log-based alerting policy for this event type, then use the following query:

log_id(cloudaudit.googleapis.com%2Factivity) resource.type=cloud_run_revision
protoPayload.methodName=google.cloud.run.v1.Services.ReplaceService

For troubleshooting information, see Troubleshoot: Cloud Run issues.

Cloud SQL event types

This section describes the Cloud SQL event types that can be shown on a dashboard.

Cloud SQL failover

This event type helps you identify when manual or automatic failovers occur. A failover occurs when there is an instance or zone failure and the standby instance becomes the new primary instance. During a failover, Cloud SQL automatically switches to serving data from the standby instance.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=cloudsql_database
(
    (
        log_id(cloudaudit.googleapis.com%2Factivity)
        protoPayload.methodName=cloudsql.instances.failover
        operation.last=true
    ) OR (
        log_id(cloudaudit.googleapis.com%2Fsystem_event)
        protoPayload.methodName=cloudsql.instances.autoFailover
    )
)

For additional information, see About high availability.

Cloud SQL start or stop

This event type helps you identify a Cloud SQL instance has been manually started, stopped, or restarted. When an instance is stopped, all connections, open files, and running operations are also stopped.

If you want to create a log-based alerting policy for this event type, then use the following query:

log_id(cloudaudit.googleapis.com%2Factivity) resource.type=cloudsql_database
protoPayload.methodName=cloudsql.instances.update operation.last=true
protoPayload.metadata.intents.intent=(START_INSTANCE OR STOP_INSTANCE)

For additional information, see About high availability and Start, stop, and restart instances.

Cloud SQL storage

This event type helps you identify events related to Cloud SQL storage, including when database storage is full, and when a database is shut down due to reaching storage capacity. Databases at storage capacity and without automatic storage enabled might be shut down to prevent data corruption.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=cloudsql_database
(
    (
        (log_id(cloudsql.googleapis.com%2Fpostgres.log) OR log_id(cloudsql.googleapis.com%2Fmysql.err))
        textPayload=~"No space left on device"
        severity=(ERROR OR EMERGENCY)
    ) OR (
        log_id(cloudaudit.googleapis.com%2Fsystem_event)
        protoPayload.methodName=cloudsql.instances.databaseShutdownOutOfStorage
    )
)

Compute Engine event types

This section describes the Compute Engine event types that can be shown on a dashboard.

Virtual machine terminations

This event type helps you identify virtual machine (VM) terminations, including manually triggered resets and stops, guest OS terminations, maintenance terminations, and host errors.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gce_instance
(
    (
        log_id(cloudaudit.googleapis.com%2Factivity)
        protoPayload.methodName=(
            beta.compute.instances.reset OR v1.compute.instances.reset OR
            beta.compute.instances.stop OR v1.compute.instances.stop
        )
        operation.first=true
    ) OR (
        log_id(cloudaudit.googleapis.com%2Fsystem_event)
        protoPayload.methodName=(
            compute.instances.hostError OR
            compute.instances.guestTerminate OR
            compute.instances.terminateOnHostMaintenance
        )
    )
)

For additional information, see Stop and start a VM and Troubleshooting VM shutdowns and reboots.

VM instance start failure

This event tracks Compute Engine VM instance start failures. The event shows start failures due to stockouts, IP space exhaustion, quota exceeded, or shielded-VM integrity errors.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gce_instance
(
    (
        log_id(cloudaudit.googleapis.com%2Factivity)
        protoPayload.methodName=(beta.compute.instances.insert OR v1.compute.instances.insert)
        protoPayload.status.message=(ZONE_RESOURCE_POOL_EXHAUSTED OR IP_SPACE_EXHAUSTED OR QUOTA_EXCEEDED)
    ) OR (
        log_id(compute.googleapis.com%2Fshielded_vm_integrity)
        severity="ERROR"
    )
)

VM instance guest OS error

This event tracks specific Compute Engine VM instance Guest OS errors as recorded by the serial console logs. The errors tracked are disk full, file system mount failed and boot failures that activate Linux emergency mode.

For these events to be visible, you must enable serial port output logging to Cloud Logging by setting serial-port-logging-enable=true in the VM or in the project metadata. For more information, see Enabling and disabling serial port output logging.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gce_instance
log_id(serialconsole.googleapis.com%2Fserial_port_1_output)
textPayload=~("No space left on device" OR "Failed to mount" OR "You are in emergency mode")

Managed instance group update

This event type helps you identify when your Managed Instance Group (MIG) has been updated. For example, VMs have been added or removed, or the size limit has been updated. For more information, see Automatically apply VM configuration updates in a MIG.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=gce_instance_group_manager
log_id(cloudaudit.googleapis.com%2Factivity) operation.first=true
protoPayload.methodName=(beta.compute.instanceGroupManagers.patch OR v1.compute.instanceGroupManagers.patch)

For additional information, see Work with managed instances and Troubleshooting managed instance groups.

Managed instance group autoscaler

This event tracks scaling decisions made by the autoscaler of a MIG. These decisions could include changes in the recommended size for a MIG, or a change in the status of the autoscaler itself. For more information, see Autoscaling groups of instances.

If you want to create a log-based alerting policy for this event type, then use the following query:

resource.type=autoscaler log_id(cloudaudit.googleapis.com%2Fsystem_event)
protoPayload.methodName=(compute.autoscalers.resize OR compute.autoscalers.changeStatus)

Personalized Service Health event types

This section describes the Personalized Service Health types that can be displayed on a dashboard.

Google Cloud incident

When you are troubleshooting, you might want to differentiate between failures caused by a service that you own and one caused by a Google Cloud service that you use. When you enable Personalized Service Health annotations on a dashboard, you can view disruptions, or service health events, for Google Cloud services. For a list of services that are integrated with Service Health, see Supported Google products.

Unlike other event types, Google Cloud incidents aren't identified by analyzing your log entries. If you want to get notified when these events occurs, then create an alerting policy. You can select a preconfigured alerting policy by using options on the Service Health Dashboard page. For more information, see Quickstart: Set up an alert.

Monitoring identifies Google Cloud incidents by issuing a request to the Service Health API, and then filtering the response to those incidents that are relevant to the data you are viewing. The request has the following configuration:

  • The Relevance enumeration is set to RELATED, IMPACTED, or PARTIALLY_RELATED. This restriction ensures that your dashboard only displays events for the Google Cloud services that your Google Cloud project is using.

  • The DetailedState enumeration isn't set to FALSE_POSITIVE.

Service Health annotations are displayed with a start time and a duration. The duration is displayed by changing the background color of the chart. The tooltip for a Google Cloud incident identifies the following:

  • The Google Cloud service.
  • Whether the incident is open or resolved.
  • Date and start time of the event.
  • Chips that display the number of affected products and locations. To list the affected products or locations, place your pointer on the corresponding chip.
  • A View button, which when selected opens the details page for the incident.

For information about how to issue a request to the Service Health API, see Check for disruptions with Service Health.

For troubleshooting information, see Troubleshoot common issues in Service Health.

Uptime check event types

This section describes the uptime check event types that can be shown on a dashboard.

Uptime check failure

This event type helps you identify uptime check failures from configured regions.

If you want to create a log-based alerting policy for this event type, then use the following query:

log_id(monitoring.googleapis.com%2Fuptime_checks)
(
  resource.type=uptime_url OR resource.type=gce_instance OR
  resource.type=gae_app OR resource.type=k8s_service OR
  resource.type=servicedirectory_service OR resource.type=cloud_run_revision OR
  resource.type=aws_ec2_instance OR resource.type=aws_elb_load_balancer
)
labels.uptime_result_type=UptimeCheckResult
severity=NOTICE

For troubleshooting information, see Troubleshoot synthetic monitors and uptime checks.

What's next

To learn how to show events on your dashboards, see Show events on a dashboard.