Dataflow managed I/O for Apache Kafka

Managed I/O supports reading and writing to Apache Kafka.

Requirements

The following SDKs support managed I/O for Apache Kafka:

  • Apache Beam SDK for Java version 2.58.0 or later
  • Apache Beam SDK for Python version 2.61.0 or later

Configuration

Managed I/O for BigQuery supports the following configuration parameters:

KAFKA Read

Configuration Type Description
bootstrap_servers str A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...`
topic str n/a
confluent_schema_registry_subject str n/a
confluent_schema_registry_url str n/a
consumer_config_updates map[str, str] A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
file_descriptor_path str The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
format str The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO
message_name str The name of the Protocol Buffer message to be used for schema extraction and data conversion.
schema str The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry.

KAFKA Write

Configuration Type Description
bootstrap_servers str A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. | Format: host1:port1,host2:port2,...
format str The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO
topic str n/a
file_descriptor_path str The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
message_name str The name of the Protocol Buffer message to be used for schema extraction and data conversion.
producer_config_updates map[str, str] A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html
schema str n/a

What's next

For more information and code examples, see the following topics: