This document is useful if you're considering migrating from self-managed Apache Kafka to Pub/Sub Lite.
Overview of Pub/Sub Lite
Pub/Sub Lite is a high-volume messaging service built for low cost of operation. Pub/Sub Lite offers Zonal and Regional Storage along with pre-provisioned capacity. Within Pub/Sub Lite, you can choose zonal or regional Lite topics. Regional Lite topics offer the same availability SLA as Pub/Sub topics. However, there are reliability differences between Pub/Sub and Pub/Sub Lite in terms of message replication.
To learn more about Pub/Sub and Pub/Sub Lite, see What is Pub/Sub.
To learn more about Lite-supported regions and zones, see Pub/Sub Lite locations.
Terminology in Pub/Sub Lite
The following are some key terms for Pub/Sub Lite.
Message. Data that moves through the Pub/Sub Lite service.
Topic. A named resource that represents a feed of messages. Within Pub/Sub Lite, you can choose to create a zonal or regional Lite topic. Pub/Sub Lite regional topics store data in two zones of a single region. Pub/Sub Lite zonal topics replicate data within just one zone.
Reservation. A named pool of throughput capacity shared by multiple Lite topics in a region.
Subscription A named resource that represents an interest in receiving messages from a particular Lite topic. A subscription is similar to a consumer group in Kafka that only connects to a single topic.
Subscriber. A client of Pub/Sub Lite that receives messages from a Lite topic and on a specified subscription. A subscription can have multiple subscriber clients. In such a case, the messages are load-balanced across the subscriber clients. In Kafka, a subscriber is called a consumer.
Publisher. An application that creates messages and sends (publishes) them to a specific Lite topic. A topic can have multiple publishers. In Kafka, a publisher is called a producer.
Differences between Kafka and Pub/Sub Lite
While Pub/Sub Lite is conceptually similar to Kafka, it's a different system with a narrower API that is more focused on data ingestion. While the differences are immaterial for stream ingestion and processing, there are some specific use cases where these differences are important.
Kafka as a database
Unlike Kafka, Pub/Sub Lite doesn't currently support transactional publishing or log compaction. These Kafka features are more useful when you use Kafka as a database than as a messaging system. If you use Kafka primarily as a database, consider running your own Kafka cluster or using a managed Kafka solution such as Confluent Cloud. If neither of these solutions are an option, you can also consider using a horizontally scalable database such as Cloud Spanner.
Kafka streams is a data processing system built on top of Kafka. While it does allow injection of consumer clients, it requires access to all administrator operations. Kafka Streams also uses the transactional database properties of Kafka for storing internal metadata. So, Pub/Sub Lite cannot currently be used for Kafka Streams applications.
Apache Beam is a similar streaming data processing system which is integrated with Kafka, Pub/Sub, and Pub/Sub Lite. You can run Beam pipelines in a fully-managed way with Dataflow, or on your preexisting Apache Flink and Apache Spark clusters.
Kafka clients can read server-side metrics. In Pub/Sub Lite, metrics relevant to publisher and subscriber behavior are managed through Cloud Monitoring with no additional configuration.
The capacity of a Kafka topic is determined by the capacity of the cluster. Replication, key compaction, and batch settings determine the capacity required to service any given topic on the Kafka cluster. The throughput of a Kafka topic is limited by the capacity of the machines on which the brokers are running. By contrast, you must define both storage and throughput capacity for a Pub/Sub Lite topic. Pub/Sub Lite storage capacity is a configurable property of the topic. Throughput capacity is based on the capacity of the configured reservation, and inherent or configured per-partition limits.
Authentication and security
Apache Kafka supports several open authentication and encryption mechanisms. With Pub/Sub Lite, authentication is based on the IAM system. Security is assured through encryption at rest and in transit. Read more about Pub/Sub Lite authentication in the Migration workflow section, later in this document.
Map Kafka properties to Pub/Sub Lite properties
Kafka has many configuration options that control topic structure, limits, and broker properties. Some common ones useful for data ingestion are discussed in this section, with their equivalents in Pub/Sub Lite. As Pub/Sub Lite is a managed system, you don't have to consider many broker properties.
Topic configuration properties
|Kafka property||Pub/Sub Lite property||Description|
|retention.bytes||Storage per partition||All the partitions in a Lite topic have the same configured storage capacity. The total storage capacity of a Lite topic is the sum of the storage capacity of all the partitions in the topic.|
|retention.ms||Message retention period||The maximum amount of time for which a Lite topic stores messages. If you don't specify a message retention period, the Lite topic stores messages until you exceed the storage capacity.|
|flush.ms, acks||Not configurable in Pub/Sub Lite||Publishes are not acknowledged until they are guaranteed to be persisted to replicated storage.|
|max.message.bytes||Not configurable in Pub/Sub Lite||3.5 MiB is the maximum message size that can be sent to Pub/Sub Lite. Message sizes are calculated in a repeatable manner.|
|message.timestamp.type||Not configurable in Pub/Sub Lite||When using the consumer implementation, the event timestamp is chosen when present or the publish timestamp is used in its stead. Both publish and event timestamps are available when using Beam.|
To learn more about the Lite topic properties, see Properties of a Lite topic.
Producer configuration properties
Pub/Sub Lite supports the Producer wire protocol. Some properties change the behavior of the producer Cloud Client Libraries; some common ones are discussed in the following table.
|Kafka property||Pub/Sub Lite property||Description|
|auto.create.topics.enable||Not configurable in Pub/Sub Lite||Create a topic and a subscription that is roughly equivalent to a consumer group for a single topic in Pub/Sub Lite. You can use the console, gcloud CLI, API, or the Cloud Client Libraries.|
|key.serializer, value.serializer||Not configurable in Pub/Sub Lite||
Required when using the Kafka Producer or equivalent library communicating using the wire protocol.
Not supported for the Pub/Sub Lite Producer shim, which only implements Producer<byte, byte>.
|batch.size||Supported in Pub/Sub Lite||Batching is supported. The recommended value for this value is 10 MiB for best performance.|
|linger.ms||Supported in Pub/Sub Lite||Batching is supported. The recommended value for this value is 50 ms for best performance.|
|max.request.size||Supported in Pub/Sub Lite||The server imposes a limit of 20 MiB per batch. Set this value to lower than 20 MiB in your Kafka client.|
|enable.idempotence||Not supported in Pub/Sub Lite||You must explicitly set this value to
|compression.type||Not supported in Pub/Sub Lite||You must explicitly set this value to
Consumer configuration properties
While we don't currently support Consumer aspects of the Kafka wire protocol, we do offer a Shim which implements the Consumer interface of the Kafka client. The Pub/Sub Lite Kafka Shim is a Java library that makes it easy for users of the Kafka Java client library to work with Pub/Sub Lite.
|key.deserializer, value.deserializer||The Shim specifically implements Consumer<byte, byte>. Any deserialization (which can possibly fail) must be performed by the user code.|
|auto.offset.reset||This configuration is not supported or needed. Subscriptions are guaranteed to have a defined offset location after they are created.|
|message.timestamp.type||When using the Consumer Shim, the event timestamp is chosen when present, or the publish timestamp is used in its stead. Both publish and event timestamps are available when using Dataflow.|
|max.partition.fetch.bytes, max.poll.records||When using the Consumer Shim, flow control settings are configurable when you create the client.|
|enable.auto.commit||When using the Consumer Shim, autocommit is configurable when you create the client.|
Compare Kafka and Pub/Sub Lite features
The following table compares Apache Kafka features with Pub/Sub Lite features:
|Message deduplication||Yes||Yes using Dataflow|
|Push subscriptions||No||Yes using Pub/Sub export|
|Message storage||Limited by available machine storage||Unlimited|
|Logging and monitoring||Self-managed||Automated with Cloud Monitoring|
|Stream processing||Yes with Kafka Streams, Apache Beam, or Dataproc.||Yes with Beam or Dataproc.|
The following table compares what functionality is self-hosted with Kafka and what functionality is managed by Google by using Pub/Sub Lite:
|Availability||Manually deploy Kafka to additional locations.||Deployed across the world. See locations.|
|Disaster recovery||Design and maintain your own backup and replication.||Managed by Google.|
|Infrastructure management||Manually deploy and operate virtual machines (VMs) or machines. Maintain consistent versioning and patches.||Managed by Google.|
|Capacity planning||Manually plan storage and compute needs in advance.||Managed by Google. You can increase compute and storage at any time.|
|Support||None.||24-hour on-call staff and support available.|
Kafka and Pub/Sub Lite cost comparison
The way you estimate and manage costs in Pub/Sub Lite is different than in Kafka. The costs for a Kafka cluster on-premises or in cloud include the cost of machines, disk, networking, data ingress, and egress. It also includes overhead costs for managing and maintaining these systems and their related infrastructure. When managing a Kafka cluster, you must manually upgrade the machines, plan cluster capacity, and implement disaster recovery that includes extensive planning and testing. You must aggregate all these various costs to determine your true total cost of ownership (TCO).
Pub/Sub Lite pricing includes the reservation cost (published bytes, subscribed bytes, bytes handled by the Kafka proxy) and the cost of provisioned storage. You pay for exactly the resources that you reserve in addition to network egress charges. You can use the pricing calculator to provide an estimate of your costs.
To migrate a topic from a Kafka cluster to Pub/Sub Lite, use the following instructions.
Configure Pub/Sub Lite resources
Create a Pub/Sub Lite reservation for the expected throughput for all the topics that you're migrating.
Use the Pub/Sub Lite pricing calculator to calculate the aggregate throughput metrics of your existing Kafka topics. For more information about how to create reservations, see Create and manage Lite reservations.
Create one Pub/Sub Lite topic for each corresponding topic in Kafka.
For more information about how to create Lite topics, see Create and manage Lite topics.
Create one Pub/Sub Lite subscription for each corresponding consumer group and topic pair in the Kafka cluster.
For example, for a consumer group named
consumersthat consumes from
topic-b, you must create a subscription
topic-aand a subscription
topic-b. For more information about how to create subscriptions, see Create and manage Lite subscriptions.
Authenticate to Pub/Sub Lite
Based on the type of your Kafka client, choose one of the following methods:
Java-based Kafka clients running version 3.1.0 or later with rebuilding
Java-based Kafka clients running version 3.1.0 or later without rebuilding
Java-based Kafka clients version 3.1.0 or later with rebuilding
For Java-based Kafka clients of version 3.1.0 or later that can be rebuilt on the instance where you're running the Kafka client:
Obtain the necessary producer parameters for authenticating to Pub/Sub Lite with the help of
getProducerParams()method (see a code sample ) initializes the following JAAS and SASL configurations as producer parameters for authenticating to Pub/Sub Lite:
security.protocol=SASL_SSL sasl.mechanism=OAUTHBEARER sasl.oauthbearer.token.endpoint.url=http://localhost:14293 sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler
Java-based Kafka clients running version 3.1.0 or later without rebuilding
For Kafka clients that support KIP-768, we support configuration-only OAUTHBEARER authentication that uses a Python sidecar script. These versions include the January 2022 Java version 3.1.0 or later.
Perform the following steps on the instance where you're running your Kafka client:
Install Python 3.6 or higher.
See Installing Python.
Install the Google authentication package:
pip install google-auth
This library simplifies the various server-to-server authentication mechanisms to access Google APIs. See the
This script starts a local HTTP server and fetches the default Google Cloud credentials in the environment using
The principal in the fetched credentials must have the
pubsublite.locations.openKafkaStreampermission for the Google Cloud project you are using and the location to which you are connecting. Pub/Sub Lite Publisher (
roles/pubsublite.publisher) and Pub/Sub Lite Subscriber (
roles/pubsublite.subscriber) roles have this required permission. Add these roles to your principal.
The credentials are used in the SASL/OAUTHBEARER authentication for the Kafka client.
The following parameters are required in your
producer.propertiesto authenticate to Pub/Sub Lite from the Kafka client:
security.protocol=SASL_SSL sasl.mechanism=OAUTHBEARER sasl.oauthbearer.token.endpoint.url=localhost:14293 sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule \ required clientId="unused" clientSecret="unused" \ extension_pubsubProject="PROJECT_ID";
Replace PROJECT_ID with the ID of your project running Pub/Sub Lite.
All other clients without rebuilding
For all other clients, perform the following steps:
Download a service account key JSON file for the service account that you intend to use for your client.
Encode the service account file by using base64-encode to use as your authentication string.
On Linux or macOS systems, you can use the
base64command (often installed by default) as follows:
base64 < my_service_account.json > password.txt
You can use the contents of the password file for authentication with the following parameters.
security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="PROJECT_ID" \ password="contents of base64 encoded password file";
Replace PROJECT_ID with the ID of your project running Pub/Sub.
security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.username=PROJECT_ID sasl.password=contents of base64 encoded password file
Replace PROJECT_ID with the ID of your project running Pub/Sub..
Clone data using Kafka Connect
The Pub/Sub Lite team maintains an implementation of a Kafka Connect Sink. You can configure this implementation to copy data from a Kafka topic to a Pub/Sub Lite topic using a Kafka Connect cluster.
To configure the connector to perform the data copy, see Pub/Sub Group Kafka Connector.
Ensure that the
pubsublite.ordering.mode config property is set to
so that the partition affinity is unaffected by the producer
Rebuild your existing Consumer jobs to use the Pub/Sub Lite Consumer<byte, byte> Shim instead of a Kafka Consumer.
Initiate (but don't wait on) an admin seek operation to prepare to read messages from the end of the message backlog.
When you start your Consumer, they reconnect to the current offset in the message backlog. Run both the old and new clients in parallel as long as it takes to verify their behavior, then turn down the old consumer clients.
In addition to the SASL configurations for the Kafka client, the following are also required as producer params when using the Kafka Producer API to interact with Pub/Sub Lite.
bootstrap.servers=REGION-kafka-pubsub.googleapis.com:443 enable.idempotence=false max.in.flight.requests.per.connection=1
Replace REGION with the region where you are running Pub/Sub Lite.
After you migrate all the consumers of the topic to read from Pub/Sub Lite, move your producer traffic to write to Pub/Sub Lite directly.
Gradually migrate the producer clients to write to the Pub/Sub Lite topic instead of the Kafka topic.
Restart the producer clients to pick up new configurations.
Turn down Kafka Connect
All you migrate all the producers to write to Pub/Sub Lite directly, the connector does not copy data anymore.
You can turn down the Kafka Connect instance.
Troubleshoot Kafka connections
Since Kafka clients communicate through a bespoke wire protocol, we cannot provide error messages for failures in all requests. Rely on the error codes sent as part of the message.
You can see more details about errors that occur in the client by
setting the logging level for the
org.apache.kafka prefix to
Low throughput and increasing backlog
There are multiple reasons why you might be seeing low throughput and an increasing backlog. One reason might be insufficient capacity.
You can configure throughput capacity at the topic level or by using reservations. If insufficient throughput capacity for subscribe and publish is configured, the corresponding throughput for subscribe and publish is throttled.
This throughput error is signaled by the
for publishers, and the
metric for subscribers. The metric provides the following states:
NO_PARTITION_CAPACITY: This message indicates that the per-partition throughput limit is reached.
NO_RESERVATION_CAPACITY: This message indicates that the per-reservation throughput limit is reached.
You can view the utilization graphs for the topic or reservation publish and subscribe quota and check whether utilization is at or near 100%.
To resolve this issue, increase the throughput capacity of the topic or reservation.
Topic authorization failed error message
Publishing by using the Kafka API requires the Lite service agent to have the right permissions to publish to the Pub/Sub Lite topic.
You get the error
TOPIC_AUTHORIZATION_FAILED in your client in the event
that you don't have the correct permissions to publish to the
Pub/Sub Lite topic.
To resolve the issue, check if the Lite service agent for the project passed in the auth configuration.