Die folgenden SDKs unterstützen verwaltete E/A für Apache Iceberg:
Apache Beam SDK für Java Version 2.58.0 oder höher
Apache Beam SDK für Python Version 2.61.0 oder höher
Konfiguration
Verwaltete E/A für Apache Iceberg unterstützt die folgenden Konfigurationsparameter:
ICEBERG Lesen
Konfiguration
Typ
Beschreibung
table
str
Kennung der Iceberg-Tabelle.
catalog_name
str
Der Name des Katalogs, der die Tabelle enthält.
catalog_properties
map[str, str]
Eigenschaften zum Einrichten des Iceberg-Katalogs.
config_properties
map[str, str]
Attribute, die an die Hadoop-Konfiguration übergeben werden.
drop
list[str]
Eine Teilmenge von Spaltennamen, die nicht gelesen werden sollen. Wenn der Wert „null“ oder leer ist, werden alle Spalten gelesen.
filtern
str
SQL-ähnliches Prädikat zum Filtern von Daten zur Scanzeit. Beispiel: „id > 5 AND status = 'ACTIVE'“. Verwendet die Apache Calcite-Syntax: https://calcite.apache.org/docs/reference.html
Notizen
list[str]
Eine Teilmenge der Spaltennamen, die ausschließlich gelesen werden sollen. Wenn der Wert „null“ oder leer ist, werden alle Spalten gelesen.
ICEBERG Schreiben
Konfiguration
Typ
Beschreibung
table
str
Eine vollständig qualifizierte Tabellenkennung. Sie können auch eine Vorlage angeben, um in mehrere dynamische Ziele zu schreiben, z. B. `dataset.my_{col1}_{col2.nested}_table`.
catalog_name
str
Der Name des Katalogs, der die Tabelle enthält.
catalog_properties
map[str, str]
Eigenschaften zum Einrichten des Iceberg-Katalogs.
config_properties
map[str, str]
Attribute, die an die Hadoop-Konfiguration übergeben werden.
drop
list[str]
Eine Liste der Feldnamen, die vor dem Schreiben aus dem Eingabe-Datensatz entfernt werden sollen. Schließt sich mit „keep“ und „only“ gegenseitig aus.
Notizen
list[str]
Eine Liste der Feldnamen, die im Eingabe-Datensatz beibehalten werden sollen. Alle anderen Felder werden vor dem Schreiben gelöscht. Schließt sich gegenseitig mit „drop“ und „only“ aus.
nur
str
Der Name eines einzelnen Datensatzfelds, das geschrieben werden soll. Schließt sich gegenseitig mit „keep“ und „drop“ aus.
partition_fields
list[str]
Felder, die zum Erstellen einer Partitionsspezifikation verwendet werden, die beim Erstellen von Tabellen angewendet wird. Für das Feld „foo“ sind die folgenden Partitionstransformationen verfügbar:
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-09-10 (UTC)."],[[["\u003cp\u003eManaged I/O for Apache Iceberg supports various catalogs, including Hadoop, Hive, REST-based catalogs, and BigQuery metastore, enabling batch and streaming read and write operations.\u003c/p\u003e\n"],["\u003cp\u003eWrite capabilities include batch writes, streaming writes, dynamic destinations, and dynamic table creation, providing flexibility in data management.\u003c/p\u003e\n"],["\u003cp\u003eFor BigQuery tables, the \u003ccode\u003eBigQueryIO\u003c/code\u003e connector with the BigQuery Storage API is used, but dynamic table creation is not supported.\u003c/p\u003e\n"],["\u003cp\u003eConfiguration parameters like \u003ccode\u003etable\u003c/code\u003e, \u003ccode\u003ecatalog_name\u003c/code\u003e, \u003ccode\u003ecatalog_properties\u003c/code\u003e, \u003ccode\u003econfig_properties\u003c/code\u003e, and \u003ccode\u003etriggering_frequency_seconds\u003c/code\u003e allow for customization of Apache Iceberg operations.\u003c/p\u003e\n"],["\u003cp\u003eThe usage of this feature requires Apache Beam SDK for Java version 2.58.0 or later, while using the BigQuery Metastore requires 2.62.0 or later if not using Runner V2.\u003c/p\u003e\n"]]],[],null,["[Managed I/O](/dataflow/docs/guides/managed-io) supports the following\ncapabilities for Apache Iceberg:\n\n| Catalogs | - Hadoop - Hive - REST-based catalogs - [BigQuery metastore](/bigquery/docs/about-bqms) (requires Apache Beam SDK 2.62.0 or later if not using Runner v2) |\n| Read capabilities | Batch read |\n| Write capabilities | - Batch write - Streaming write - [Dynamic destinations](/dataflow/docs/guides/write-to-iceberg#dynamic-destinations) - Dynamic table creation |\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|\n\nFor [BigQuery tables for Apache Iceberg](/bigquery/docs/iceberg-tables),\nuse the\n[`BigQueryIO` connector](https://beam.apache.org/documentation/io/built-in/google-bigquery/)\nwith BigQuery Storage API. The table must already exist; dynamic table creation is\nnot supported.\n\nRequirements\n\nThe following SDKs support managed I/O for Apache Iceberg:\n\n- Apache Beam SDK for Java version 2.58.0 or later\n- Apache Beam SDK for Python version 2.61.0 or later\n\nConfiguration\n\nManaged I/O for Apache Iceberg supports the following configuration\nparameters:\n\n`ICEBERG` Read \n\n| Configuration | Type | Description |\n|--------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | Identifier of the Iceberg table. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A subset of column names to exclude from reading. If null or empty, all columns will be read. |\n| filter | `str` | SQL-like predicate to filter data at scan time. Example: \"id \\\u003e 5 AND status = 'ACTIVE'\". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |\n| keep | `list[`str`]` | A subset of column names to read exclusively. If null or empty, all columns will be read. |\n\n\u003cbr /\u003e\n\n`ICEBERG` Write \n\u003cbr /\u003e\n\n| Configuration | Type | Description |\n|------------------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: \\`dataset.my_{col1}_{col2.nested}_table\\`. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. |\n| keep | `list[`str`]` | A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. |\n| only | `str` | The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. |\n| partition_fields | `list[`str`]` | Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are: - `foo` - `truncate(foo, N)` - `bucket(foo, N)` - `hour(foo)` - `day(foo)` - `month(foo)` - `year(foo)` - `void(foo)` For more information on partition transforms, please visit \u003chttps://iceberg.apache.org/spec/#partition-transforms\u003e. |\n| table_properties | `map[`str`, `str`]` | Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit \u003chttps://iceberg.apache.org/docs/latest/configuration/#table-properties\u003e. |\n| triggering_frequency_seconds | `int32` | For a streaming pipeline, sets the frequency at which snapshots are produced. |\n\n\u003cbr /\u003e\n\nWhat's next\n\nFor more information and code examples, see the following topics:\n\n- [Read from Apache Iceberg](/dataflow/docs/guides/read-from-iceberg)\n- [Write to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)"]]