SDK berikut mendukung I/O terkelola untuk Apache Iceberg:
Apache Beam SDK untuk Java versi 2.58.0 atau yang lebih baru
Apache Beam SDK untuk Python versi 2.61.0 atau yang lebih baru
Konfigurasi
I/O Terkelola untuk Apache Iceberg mendukung parameter konfigurasi berikut:
ICEBERG Baca
Konfigurasi
Jenis
Deskripsi
table
str
ID tabel Iceberg.
catalog_name
str
Nama katalog yang berisi tabel.
catalog_properties
map[str, str]
Properti yang digunakan untuk menyiapkan katalog Iceberg.
config_properties
map[str, str]
Properti yang diteruskan ke Konfigurasi Hadoop.
drop
list[str]
Subkumpulan nama kolom yang akan dikecualikan dari pembacaan. Jika null atau kosong, semua kolom akan dibaca.
filter
str
Predikat seperti SQL untuk memfilter data pada waktu pemindaian. Contoh: "id > 5 AND status = 'ACTIVE'". Menggunakan sintaksis Apache Calcite: https://calcite.apache.org/docs/reference.html
keep
list[str]
Subset nama kolom yang akan dibaca secara eksklusif. Jika null atau kosong, semua kolom akan dibaca.
ICEBERG Menulis
Konfigurasi
Jenis
Deskripsi
table
str
ID tabel yang sepenuhnya memenuhi syarat. Anda juga dapat memberikan template untuk menulis ke beberapa tujuan dinamis, misalnya: `dataset.my_{col1}_{col2.nested}_table`.
catalog_name
str
Nama katalog yang berisi tabel.
catalog_properties
map[str, str]
Properti yang digunakan untuk menyiapkan katalog Iceberg.
config_properties
map[str, str]
Properti yang diteruskan ke Konfigurasi Hadoop.
drop
list[str]
Daftar nama kolom yang akan dihapus dari rekaman input sebelum penulisan. Tidak dapat muncul bersamaan dengan 'keep' dan 'only'.
keep
list[str]
Daftar nama kolom yang akan disimpan dalam rekaman input. Semua kolom lainnya akan dihapus sebelum penulisan. Tidak dapat muncul bersamaan dengan 'drop' dan 'only'.
saja
str
Nama kolom satu catatan yang harus ditulis. Tidak dapat muncul bersamaan dengan 'keep' dan 'drop'.
partition_fields
list[str]
Kolom yang digunakan untuk membuat spesifikasi partisi yang diterapkan saat tabel dibuat. Untuk kolom 'foo', transformasi partisi yang tersedia adalah:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-10 UTC."],[[["\u003cp\u003eManaged I/O for Apache Iceberg supports various catalogs, including Hadoop, Hive, REST-based catalogs, and BigQuery metastore, enabling batch and streaming read and write operations.\u003c/p\u003e\n"],["\u003cp\u003eWrite capabilities include batch writes, streaming writes, dynamic destinations, and dynamic table creation, providing flexibility in data management.\u003c/p\u003e\n"],["\u003cp\u003eFor BigQuery tables, the \u003ccode\u003eBigQueryIO\u003c/code\u003e connector with the BigQuery Storage API is used, but dynamic table creation is not supported.\u003c/p\u003e\n"],["\u003cp\u003eConfiguration parameters like \u003ccode\u003etable\u003c/code\u003e, \u003ccode\u003ecatalog_name\u003c/code\u003e, \u003ccode\u003ecatalog_properties\u003c/code\u003e, \u003ccode\u003econfig_properties\u003c/code\u003e, and \u003ccode\u003etriggering_frequency_seconds\u003c/code\u003e allow for customization of Apache Iceberg operations.\u003c/p\u003e\n"],["\u003cp\u003eThe usage of this feature requires Apache Beam SDK for Java version 2.58.0 or later, while using the BigQuery Metastore requires 2.62.0 or later if not using Runner V2.\u003c/p\u003e\n"]]],[],null,["[Managed I/O](/dataflow/docs/guides/managed-io) supports the following\ncapabilities for Apache Iceberg:\n\n| Catalogs | - Hadoop - Hive - REST-based catalogs - [BigQuery metastore](/bigquery/docs/about-bqms) (requires Apache Beam SDK 2.62.0 or later if not using Runner v2) |\n| Read capabilities | Batch read |\n| Write capabilities | - Batch write - Streaming write - [Dynamic destinations](/dataflow/docs/guides/write-to-iceberg#dynamic-destinations) - Dynamic table creation |\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|\n\nFor [BigQuery tables for Apache Iceberg](/bigquery/docs/iceberg-tables),\nuse the\n[`BigQueryIO` connector](https://beam.apache.org/documentation/io/built-in/google-bigquery/)\nwith BigQuery Storage API. The table must already exist; dynamic table creation is\nnot supported.\n\nRequirements\n\nThe following SDKs support managed I/O for Apache Iceberg:\n\n- Apache Beam SDK for Java version 2.58.0 or later\n- Apache Beam SDK for Python version 2.61.0 or later\n\nConfiguration\n\nManaged I/O for Apache Iceberg supports the following configuration\nparameters:\n\n`ICEBERG` Read \n\n| Configuration | Type | Description |\n|--------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | Identifier of the Iceberg table. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A subset of column names to exclude from reading. If null or empty, all columns will be read. |\n| filter | `str` | SQL-like predicate to filter data at scan time. Example: \"id \\\u003e 5 AND status = 'ACTIVE'\". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |\n| keep | `list[`str`]` | A subset of column names to read exclusively. If null or empty, all columns will be read. |\n\n\u003cbr /\u003e\n\n`ICEBERG` Write \n\u003cbr /\u003e\n\n| Configuration | Type | Description |\n|------------------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: \\`dataset.my_{col1}_{col2.nested}_table\\`. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. |\n| keep | `list[`str`]` | A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. |\n| only | `str` | The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. |\n| partition_fields | `list[`str`]` | Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are: - `foo` - `truncate(foo, N)` - `bucket(foo, N)` - `hour(foo)` - `day(foo)` - `month(foo)` - `year(foo)` - `void(foo)` For more information on partition transforms, please visit \u003chttps://iceberg.apache.org/spec/#partition-transforms\u003e. |\n| table_properties | `map[`str`, `str`]` | Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit \u003chttps://iceberg.apache.org/docs/latest/configuration/#table-properties\u003e. |\n| triggering_frequency_seconds | `int32` | For a streaming pipeline, sets the frequency at which snapshots are produced. |\n\n\u003cbr /\u003e\n\nWhat's next\n\nFor more information and code examples, see the following topics:\n\n- [Read from Apache Iceberg](/dataflow/docs/guides/read-from-iceberg)\n- [Write to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)"]]