To write from Dataflow to Apache Iceberg, use the managed I/O connector.
Dependencies
Add the following dependencies to your project:
Java
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-managed</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-iceberg</artifactId>
<version>${beam.version}</version>
</dependency>
Configuration
Managed I/O uses the following configuration parameters for Apache Iceberg:
Read and write configuration | Data type | Description |
---|---|---|
table |
string | The identifier of the Apache Iceberg table. Example:
"db.table1" . |
catalog_name |
string | The name of the catalog. Example: "local" . |
catalog_properties |
map | A map of configuration properties for the Apache Iceberg
catalog. The required properties depend on the catalog. For more
information, see
CatalogUtil in the Apache Iceberg documentation. |
config_properties |
map | An optional set of Hadoop configuration properties. For more
information, see
CatalogUtil in the Apache Iceberg documentation. |
Write configuration | Data type | Description |
triggering_frequency_seconds |
integer | For streaming write pipelines, the frequency at which the sink attempts to produce snapshots, in seconds. |
The drop
, keep
, and only
parameters are mutually exclusive. You can set
at most one of them.
Dynamic destinations
Managed I/O for Apache Iceberg supports dynamic destinations. Instead of writing to a single fixed table, the connector can dynamically select a destination table based on field values within the incoming records.
To use dynamic destinations, provide a template for the table
configuration
parameter. For more information, see
Dynamic destinations.
Example
The following example writes in-memory JSON data to an Apache Iceberg table.
Java
To authenticate to Dataflow, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.