To write from Dataflow to Apache Iceberg, use the managed I/O connector.
Managed I/O supports the following capabilities for Apache Iceberg:
Catalogs |
|
---|---|
Read capabilities | Batch read |
Write capabilities |
|
For BigQuery tables for Apache Iceberg,
use the
BigQueryIO
connector
with BigQuery Storage API. The table must already exist; dynamic table creation is
not supported.
Dependencies
Add the following dependencies to your project:
Java
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-managed</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-iceberg</artifactId>
<version>${beam.version}</version>
</dependency>
Dynamic destinations
Managed I/O for Apache Iceberg supports dynamic destinations. Instead of writing to a single fixed table, the connector can dynamically select a destination table based on field values within the incoming records.
To use dynamic destinations, provide a template for the table
configuration
parameter. For more information, see
Dynamic destinations.
Examples
The following examples show how to use Managed I/O to write to Apache Iceberg.
Write to an Apache Iceberg table
The following example writes in-memory JSON data to an Apache Iceberg table.
Java
To authenticate to Dataflow, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Write with dynamic destinations
The following example writes to different Apache Iceberg tables based on a field in the input data.
Java
To authenticate to Dataflow, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
What's next
- Read from Apache Iceberg.
- Learn more about Managed I/O.