Monitor a Google Cloud Managed Service for Apache Kafka cluster

You can use the Google Cloud console or the Cloud Monitoring API to monitor Managed Service for Apache Kafka.

This section provides an overview of the monitoring metrics available to monitor Managed Service for Apache Kafka. This document also shows you how to monitor your Managed Service for Apache Kafka usage in the Google Cloud console using Monitoring.

  • If you want to view metrics from other Google Cloud resources in addition to the complete set of Managed Service for Apache Kafka metrics, use Monitoring.

  • Otherwise, you can use the monitoring dashboards with a selection of metrics provided within Managed Service for Apache Kafka. For more information, see the following topics:

Overview of the Managed Service for Apache Kafka metrics

Managed Service for Apache Kafka exports several metrics available in the open-source Kafka distribution, as well as service-specific metrics like consumer group offset lag. For monitoring, the Managed Service for Apache Kafka service is identified by the service URL managedkafka.googleapis.com.

Managed Service for Apache Kafka metrics are organized into four resource categories:

  • Cluster: These metrics are intended for maintaining overall cluster health.

  • Topic: These metrics include publisher and consumer rates and errors. These metrics monitor the overall health of Kafka applications and issues specific to a broker.

  • Topic Partition: These metrics are intended for monitoring and debugging performance problems specific to individual partitions. An example is uneven key distribution.

  • Topic Partition Consumer Group: These metrics monitor consumer application health, primarily consumer lag. Open source Kafka error metrics for consumer groups are not available by partition but only at the topic level.

Some metrics can be grouped by broker. While the Managed Service for Apache Kafka service itself does not expose brokers as a resource, monitoring them is essential to detect failure scenarios like latency due to overloaded brokers.

The metrics are named following the convention that includes the service API URL, monitored resource and the metric. For example the topic message_in_count metric identifier is managedkafka.googleapis.com/Topic/message_in_count.

To access these metrics, see View a single Managed Service for Apache Kafka metric.

Before you begin

Before you use Monitoring, ensure that you've prepared an Managed Service for Apache Kafka project with billing enabled. One way to do this is to complete the Quickstart for Managed Service for Apache Kafka.

Required roles and permissions

To get the permissions that you need to view monitoring charts, ask your administrator to grant you the Managed Kafka Viewer (roles/managedkafka.Viewer) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

For more information about this role, see Managed Service for Apache Kafka predefined roles.

View a single Managed Service for Apache Kafka metric

To view a single Managed Service for Apache Kafka metric by using the Google Cloud console, perform the following steps:

  1. In the Google Cloud console, go to the Monitoring page.

    Go to Monitoring

  2. In the navigation pane, select Metrics explorer.

  3. In the Configuration section, click Select a metric.

  4. In the filter, enter Apache Kafka.

  5. In Active resources, select one of the following:

    • Apache Kafka Cluster

    • Apache Kafka Topic

    • Apache Kafka Topic Partition

    • Apache Kafka Topic Partition Consumer Group

  6. Select a metric and click Apply.

    The page for a specific metric opens.

You can learn more about the monitoring dashboard by reading the Cloud Monitoring documentation.

Cluster metrics

Metric Description Equivalent MBean Name
cpu/core_usage_time Cumulative CPU usage of the cluster in vCPU. This can be useful for understanding the overall cost of operation for the cluster. N/A
cpu/limit Current CPU count configured for the cluster. Can be used to monitor CPU utilization as a ratio with the cpu/usage metric. N/A
memory/usage Current RAM usage on the cluster. Can be used to monitor RAM utilization as a ratio with the memory/limit metric. N/A
memory/limit Current configured RAM size of the cluster. Can be used to monitor RAM utilization as a ratio with the memory/usage metric. N/A
cluster_byte_in_count The total number of bytes from clients sent to all topics. kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
cluster_byte_out_count The total number of bytes sent to clients from all topics. kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
cluster_message_in_count The total number of messages that have been published to the topic. kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=([-.\\w]+)
request_count The total number of requests made to the broker kafka.network:type=RequestMetrics,name=RequestsPerSec,request= {Produce|FetchConsumer|FetchFollower},version=([0-9]+)
request_byte_count The total size, in bytes, of requests made to the Cluster. kafka.network:type=RequestMetrics,name=RequestBytes,request= ([-.\\w]+)
partitions The current number of partitions handled by this cluster, broken down by broker. kafka.server:type=ReplicaManager,name=PartitionCount
request_latencies The number of milliseconds taken for each request, at various percentiles kafka.network:type=RequestMetrics,name=TotalTimeMs,request= {Produce|FetchConsumer|FetchFollower}
consumer_groups The current number of Consumer Groups consuming from the broker kafka.server:type=GroupMetadataManager,name=NumGroups

Topic metrics

Metric Description Equivalent MBean name
message_in_count The total number of messages published to the topic. kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec, topic=([-.\\w]+)
byte_in_count The total number of bytes from clients sent to the topic. kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\\w]+)
topic_request_count The total number of failed requests made to the topic, including producer and consumer requests. kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec,topic=([-.\\w]+)
kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec,topic=([-.\\w]+)
topic_error_count The total size, in bytes, of requests made to the Cluster. kafka.network:type=RequestMetrics,name=RequestBytes,request=([-.\\w]+)
byte_out_count The total number of bytes sent to clients. kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec, topic=([-.\\w]+)

Partition metrics

Metric Description Equivalent MBean name
log_segments The current number of log segments. This is useful to make sure storage tiering remains healthy. kafka.log:type=Log,name=NumLogSegments,topic=([-.\\w]+),partition=([0-9]+)
first_offset The first offset for each partition in the topic. In combination with the last_offset, it can be used to monitor an upper bound on the total number of messages stored as well as to find the actual offset of the oldest message. kafka.log:type=Log,name=LogStartOffset,topic=([-.\\w]+),partition=([0-9]+)
last_offset The last offset in the partition. This can be used to find the latest offset for each partition over time. This can be useful in identifying the specific offset needed to reprocess data starting from a particular time in the past. kafka.log:type=Log,name=LogEndOffset,topic=([-.\\w]+),partition=([0-9]+)
byte_size The size of the partition on disk in bytes. -

Consumer group metrics

Metric Description Equivalent MBean name
Offset_lag The number of messages that the consumer group has not yet committed on the partition. N/A

What's next