Los siguientes SDKs admiten la entrada/salida gestionada para Apache Iceberg:
Versión 2.58.0 o posterior del SDK de Apache Beam para Java
Versión 2.61.0 o posterior del SDK de Apache Beam para Python
Configuración
La E/S gestionada de Apache Iceberg admite los siguientes parámetros de configuración:
ICEBERG Leer
Configuración
Tipo
Descripción
tabla
str
Identificador de la tabla Iceberg.
catalog_name
str
Nombre del catálogo que contiene la tabla.
catalog_properties
map[str, str]
Propiedades usadas para configurar el catálogo de Iceberg.
config_properties
map[str, str]
Propiedades transferidas a la configuración de Hadoop.
drop
list[str]
Un subconjunto de nombres de columnas que se excluirán de la lectura. Si es nulo o está vacío, se leerán todas las columnas.
filtrar
str
Predicado similar a SQL para filtrar datos en el momento del análisis. Por ejemplo: "id > 5 AND status = 'ACTIVE'". Usa la sintaxis de Apache Calcite: https://calcite.apache.org/docs/reference.html.
keep
list[str]
Un subconjunto de nombres de columnas que se leerán exclusivamente. Si es nulo o está vacío, se leerán todas las columnas.
ICEBERG Escribir
Configuración
Tipo
Descripción
tabla
str
Identificador de tabla completo. También puede proporcionar una plantilla para escribir en varios destinos dinámicos, por ejemplo: `dataset.my_{col1}_{col2.nested}_table`.
catalog_name
str
Nombre del catálogo que contiene la tabla.
catalog_properties
map[str, str]
Propiedades usadas para configurar el catálogo de Iceberg.
config_properties
map[str, str]
Propiedades transferidas a la configuración de Hadoop.
drop
list[str]
Lista de nombres de campos que se van a eliminar del registro de entrada antes de escribir. Se excluye mutuamente con "keep" y "only".
keep
list[str]
Lista de nombres de campos que se conservarán en el registro de entrada. El resto de los campos se descartan antes de escribir. Se excluye mutuamente con "drop" y "only".
solo
str
Nombre de un campo de registro que se debe escribir. Se excluye mutuamente con "keep" y "drop".
partition_fields
list[str]
Campos que se usan para crear una especificación de partición que se aplica cuando se crean tablas. En el caso del campo "foo", las transformaciones de partición disponibles son las siguientes:
[[["Es fácil de entender","easyToUnderstand","thumb-up"],["Me ofreció una solución al problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Es difícil de entender","hardToUnderstand","thumb-down"],["La información o el código de muestra no son correctos","incorrectInformationOrSampleCode","thumb-down"],["Me faltan las muestras o la información que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-10 (UTC)."],[[["\u003cp\u003eManaged I/O for Apache Iceberg supports various catalogs, including Hadoop, Hive, REST-based catalogs, and BigQuery metastore, enabling batch and streaming read and write operations.\u003c/p\u003e\n"],["\u003cp\u003eWrite capabilities include batch writes, streaming writes, dynamic destinations, and dynamic table creation, providing flexibility in data management.\u003c/p\u003e\n"],["\u003cp\u003eFor BigQuery tables, the \u003ccode\u003eBigQueryIO\u003c/code\u003e connector with the BigQuery Storage API is used, but dynamic table creation is not supported.\u003c/p\u003e\n"],["\u003cp\u003eConfiguration parameters like \u003ccode\u003etable\u003c/code\u003e, \u003ccode\u003ecatalog_name\u003c/code\u003e, \u003ccode\u003ecatalog_properties\u003c/code\u003e, \u003ccode\u003econfig_properties\u003c/code\u003e, and \u003ccode\u003etriggering_frequency_seconds\u003c/code\u003e allow for customization of Apache Iceberg operations.\u003c/p\u003e\n"],["\u003cp\u003eThe usage of this feature requires Apache Beam SDK for Java version 2.58.0 or later, while using the BigQuery Metastore requires 2.62.0 or later if not using Runner V2.\u003c/p\u003e\n"]]],[],null,["[Managed I/O](/dataflow/docs/guides/managed-io) supports the following\ncapabilities for Apache Iceberg:\n\n| Catalogs | - Hadoop - Hive - REST-based catalogs - [BigQuery metastore](/bigquery/docs/about-bqms) (requires Apache Beam SDK 2.62.0 or later if not using Runner v2) |\n| Read capabilities | Batch read |\n| Write capabilities | - Batch write - Streaming write - [Dynamic destinations](/dataflow/docs/guides/write-to-iceberg#dynamic-destinations) - Dynamic table creation |\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|\n\nFor [BigQuery tables for Apache Iceberg](/bigquery/docs/iceberg-tables),\nuse the\n[`BigQueryIO` connector](https://beam.apache.org/documentation/io/built-in/google-bigquery/)\nwith BigQuery Storage API. The table must already exist; dynamic table creation is\nnot supported.\n\nRequirements\n\nThe following SDKs support managed I/O for Apache Iceberg:\n\n- Apache Beam SDK for Java version 2.58.0 or later\n- Apache Beam SDK for Python version 2.61.0 or later\n\nConfiguration\n\nManaged I/O for Apache Iceberg supports the following configuration\nparameters:\n\n`ICEBERG` Read \n\n| Configuration | Type | Description |\n|--------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | Identifier of the Iceberg table. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A subset of column names to exclude from reading. If null or empty, all columns will be read. |\n| filter | `str` | SQL-like predicate to filter data at scan time. Example: \"id \\\u003e 5 AND status = 'ACTIVE'\". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |\n| keep | `list[`str`]` | A subset of column names to read exclusively. If null or empty, all columns will be read. |\n\n\u003cbr /\u003e\n\n`ICEBERG` Write \n\u003cbr /\u003e\n\n| Configuration | Type | Description |\n|------------------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **table** | `str` | A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: \\`dataset.my_{col1}_{col2.nested}_table\\`. |\n| catalog_name | `str` | Name of the catalog containing the table. |\n| catalog_properties | `map[`str`, `str`]` | Properties used to set up the Iceberg catalog. |\n| config_properties | `map[`str`, `str`]` | Properties passed to the Hadoop Configuration. |\n| drop | `list[`str`]` | A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. |\n| keep | `list[`str`]` | A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. |\n| only | `str` | The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. |\n| partition_fields | `list[`str`]` | Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are: - `foo` - `truncate(foo, N)` - `bucket(foo, N)` - `hour(foo)` - `day(foo)` - `month(foo)` - `year(foo)` - `void(foo)` For more information on partition transforms, please visit \u003chttps://iceberg.apache.org/spec/#partition-transforms\u003e. |\n| table_properties | `map[`str`, `str`]` | Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit \u003chttps://iceberg.apache.org/docs/latest/configuration/#table-properties\u003e. |\n| triggering_frequency_seconds | `int32` | For a streaming pipeline, sets the frequency at which snapshots are produced. |\n\n\u003cbr /\u003e\n\nWhat's next\n\nFor more information and code examples, see the following topics:\n\n- [Read from Apache Iceberg](/dataflow/docs/guides/read-from-iceberg)\n- [Write to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)"]]