Managed I/O supports the following capabilities for Apache Iceberg:
Catalogs |
|
---|---|
Read capabilities | Batch read |
Write capabilities |
|
For BigQuery tables for Apache Iceberg,
use the
BigQueryIO
connector
with BigQuery Storage API. The table must already exist; dynamic table creation is
not supported.
Requirements
The following SDKs support managed I/O for Apache Iceberg:
- Apache Beam SDK for Java version 2.58.0 or later
- Apache Beam SDK for Python version 2.61.0 or later
Configuration
Managed I/O for Apache Iceberg supports the following configuration parameters:
ICEBERG
Read
Configuration | Type | Description |
---|---|---|
table |
str
|
Identifier of the Iceberg table. |
catalog_name |
str
|
Name of the catalog containing the table. |
catalog_properties |
map[str, str]
|
Properties used to set up the Iceberg catalog. |
config_properties |
map[str, str]
|
Properties passed to the Hadoop Configuration. |
drop |
list[str]
|
A subset of column names to exclude from reading. If null or empty, all columns will be read. |
filter |
str
|
SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |
keep |
list[str]
|
A subset of column names to read exclusively. If null or empty, all columns will be read. |
ICEBERG
Write
Configuration | Type | Description |
---|---|---|
table |
str
|
A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: `dataset.my_{col1}_{col2.nested}_table`. |
catalog_name |
str
|
Name of the catalog containing the table. |
catalog_properties |
map[str, str]
|
Properties used to set up the Iceberg catalog. |
config_properties |
map[str, str]
|
Properties passed to the Hadoop Configuration. |
drop |
list[str]
|
A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. |
keep |
list[str]
|
A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. |
only |
str
|
The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. |
partition_fields |
list[str]
|
Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are:
For more information on partition transforms, please visit https://iceberg.apache.org/spec/#partition-transforms. |
table_properties |
map[str, str]
|
Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit https://iceberg.apache.org/docs/latest/configuration/#table-properties. |
triggering_frequency_seconds |
int32
|
For a streaming pipeline, sets the frequency at which snapshots are produced. |
What's next
For more information and code examples, see the following topics: