Managed I/O supports reading and writing to Apache Kafka.
Requirements
The following SDKs support managed I/O for Apache Kafka:
- Apache Beam SDK for Java version 2.58.0 or later
- Apache Beam SDK for Python version 2.61.0 or later
Configuration
Managed I/O for BigQuery supports the following configuration parameters:
KAFKA
Read
Configuration | Type | Description |
---|---|---|
bootstrap_servers |
str
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...` |
topic |
str
|
n/a |
confluent_schema_registry_subject |
str
|
n/a |
confluent_schema_registry_url |
str
|
n/a |
consumer_config_updates |
map[str, str]
|
A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html |
file_descriptor_path |
str
|
The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. |
format |
str
|
The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO |
message_name |
str
|
The name of the Protocol Buffer message to be used for schema extraction and data conversion. |
schema |
str
|
The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry. |
KAFKA
Write
Configuration | Type | Description |
---|---|---|
bootstrap_servers |
str
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. | Format: host1:port1,host2:port2,... |
format |
str
|
The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO |
topic |
str
|
n/a |
file_descriptor_path |
str
|
The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. |
message_name |
str
|
The name of the Protocol Buffer message to be used for schema extraction and data conversion. |
producer_config_updates |
map[str, str]
|
A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html |
schema |
str
|
n/a |
What's next
For more information and code examples, see the following topics: