To read from Apache Iceberg to Dataflow, use the managed I/O connector.
Dependencies
Add the following dependencies to your project:
Java
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-managed</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-iceberg</artifactId>
<version>${beam.version}</version>
</dependency>
Configuration
Managed I/O uses the following configuration parameters for Apache Iceberg:
Name | Data type | Description |
---|---|---|
table |
string | The identifier of the Apache Iceberg table. Example:
"db.table1" . |
catalog_name |
string | The name of the catalog. Example: "local" . |
catalog_properties |
map | A map of configuration properties for the Apache Iceberg
catalog. The required properties depend on the catalog. For more
information, see
CatalogUtil in the Apache Iceberg documentation. |
config_properties |
map | An optional set of Hadoop configuration properties. For more
information, see
CatalogUtil in the Apache Iceberg documentation. |
Example
The following example reads from an Apache Iceberg table and writes the data to text files.
Java
To authenticate to Dataflow, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.