You can use the Google Cloud console or the Cloud Monitoring API to monitor Managed Service for Apache Kafka.
This section provides an overview of the monitoring metrics available to monitor Managed Service for Apache Kafka. This document also shows you how to monitor your Managed Service for Apache Kafka usage in the Google Cloud console using Monitoring.
If you want to view metrics from other Google Cloud resources in addition to the complete set of Managed Service for Apache Kafka metrics, use Monitoring.
Otherwise, you can use the monitoring dashboards with a selection of metrics provided within Managed Service for Apache Kafka. For more information, see the following topics:
Overview of the Managed Service for Apache Kafka metrics
Managed Service for Apache Kafka exports several metrics available in the
open-source Kafka distribution, as well as service-specific metrics like
consumer group offset lag. For monitoring, the Managed Service for Apache Kafka
service is identified by the service URL managedkafka.googleapis.com
.
Managed Service for Apache Kafka metrics are organized into four resource categories:
Cluster: These metrics are intended for maintaining overall cluster health.
Topic: These metrics include publisher and consumer rates and errors. These metrics monitor the overall health of Kafka applications and issues specific to a broker.
Topic Partition: These metrics are intended for monitoring and debugging performance problems specific to individual partitions. An example is uneven key distribution.
Topic Partition Consumer Group: These metrics monitor consumer application health, primarily consumer lag. Open source Kafka error metrics for consumer groups are not available by partition but only at the topic level.
Some metrics can be grouped by broker. While the Managed Service for Apache Kafka service itself does not expose brokers as a resource, monitoring them is essential to detect failure scenarios like latency due to overloaded brokers.
The metrics are named following the convention that includes the service
API URL, monitored resource and the metric. For example the
topic message_in_count
metric identifier is
managedkafka.googleapis.com/Topic/message_in_count
.
To access these metrics, see View a single Managed Service for Apache Kafka metric.
Before you begin
Before you use Monitoring, ensure that you've prepared an Managed Service for Apache Kafka project with billing enabled. One way to do this is to complete the Quickstart for Managed Service for Apache Kafka.
Required roles and permissions
To get the permissions that you need to view monitoring charts,
ask your administrator to grant you the
Managed Kafka Viewer (roles/managedkafka.Viewer
) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
For more information about this role, see Managed Service for Apache Kafka predefined roles.
View a single Managed Service for Apache Kafka metric
To view a single Managed Service for Apache Kafka metric by using the Google Cloud console, perform the following steps:
In the Google Cloud console, go to the Monitoring page.
In the navigation pane, select Metrics explorer.
In the Configuration section, click Select a metric.
In the filter, enter
Apache Kafka
.In Active resources, select one of the following:
Apache Kafka Cluster
Apache Kafka Topic
Apache Kafka Topic Partition
Apache Kafka Topic Partition Consumer Group
Select a metric and click Apply.
The page for a specific metric opens.
You can learn more about the monitoring dashboard by reading the Cloud Monitoring documentation.
Cluster metrics
Metric | Description | Equivalent MBean Name |
---|---|---|
cpu/usage | Current CPU usage for the cluster in vCPUs. Can be used to monitor
CPU utilization as a ratio with the cpu/limit metric. |
N/A |
cpu/core_usage_time | Cumulative CPU usage of the cluster in vCPU. This can be useful for understanding the overall cost of operation for the cluster. | N/A |
cpu/limit | Current CPU count configured for the cluster. Can be used to
monitor CPU utilization as a ratio with the cpu/usage metric. |
N/A |
memory/usage | Current RAM usage on the cluster. Can be used to monitor
RAM utilization as a ratio with the memory/limit metric. |
N/A |
memory/limit | Current configured RAM size of the cluster. Can be used to monitor
RAM utilization as a ratio with the memory/usage metric. |
N/A |
cluster_byte_in_count | The total number of bytes from clients sent to all topics. | kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec |
cluster_byte_out_count | The total number of bytes sent to clients from all topics. | kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec |
cluster_message_in_count | The total number of messages that have been published to the topic. | kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=([-.\\w]+) |
request_count | The total number of requests made to the broker | kafka.network:type=RequestMetrics,name=RequestsPerSec,request=
{Produce|FetchConsumer|FetchFollower},version=([0-9]+) |
request_byte_count | The total size, in bytes, of requests made to the Cluster. | kafka.network:type=RequestMetrics,name=RequestBytes,request=
([-.\\w]+) |
partitions | The current number of partitions handled by this cluster, broken down by broker. | kafka.server:type=ReplicaManager,name=PartitionCount |
request_latencies | The number of milliseconds taken for each request, at various percentiles | kafka.network:type=RequestMetrics,name=TotalTimeMs,request=
{Produce|FetchConsumer|FetchFollower} |
consumer_groups | The current number of Consumer Groups consuming from the broker | kafka.server:type=GroupMetadataManager,name=NumGroups |
Topic metrics
Metric | Description | Equivalent MBean name |
---|---|---|
message_in_count | The total number of messages published to the topic. | kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,
topic=([-.\\w]+) |
byte_in_count | The total number of bytes from clients sent to the topic. | kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\\w]+) |
topic_request_count | The total number of failed requests made to the topic, including producer and consumer requests. | kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec,topic=([-.\\w]+) kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec,topic=([-.\\w]+) |
topic_error_count | The total size, in bytes, of requests made to the Cluster. | kafka.network:type=RequestMetrics,name=RequestBytes,request=([-.\\w]+) |
byte_out_count | The total number of bytes sent to clients. | kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,
topic=([-.\\w]+) |
Partition metrics
Metric | Description | Equivalent MBean name |
---|---|---|
log_segments | The current number of log segments. This is useful to make sure storage tiering remains healthy. | kafka.log:type=Log,name=NumLogSegments,topic=([-.\\w]+),partition=([0-9]+) |
first_offset | The first offset for each partition in the topic. In combination
with the last_offset , it can be used to monitor an upper
bound on the total number of messages stored as well as to find the
actual offset of the oldest message. |
kafka.log:type=Log,name=LogStartOffset,topic=([-.\\w]+),partition=([0-9]+) |
last_offset | The last offset in the partition. This can be used to find the latest offset for each partition over time. This can be useful in identifying the specific offset needed to reprocess data starting from a particular time in the past. | kafka.log:type=Log,name=LogEndOffset,topic=([-.\\w]+),partition=([0-9]+) |
byte_size | The size of the partition on disk in bytes. | - |
Consumer group metrics
Metric | Description | Equivalent MBean name |
---|---|---|
Offset_lag | The number of messages that the consumer group has not yet committed on the partition. | N/A |