Cómo escribir en Iceberg con destinos dinámicos
Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Escribe desde Dataflow hasta Apache Iceberg con la función de destinos dinámicos para enrutar los registros entrantes a diferentes tablas de Iceberg.
(Ten en cuenta que, actualmente, esta función solo es compatible con Java).
Explora más
Para obtener documentación en la que se incluye esta muestra de código, consulta lo siguiente:
Muestra de código
Salvo que se indique lo contrario, el contenido de esta página está sujeto a la licencia Atribución 4.0 de Creative Commons, y los ejemplos de código están sujetos a la licencia Apache 2.0. Para obtener más información, consulta las políticas del sitio de Google Developers. Java es una marca registrada de Oracle o sus afiliados.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis code sample demonstrates how to write data from Dataflow to Apache Iceberg using the dynamic destinations feature, routing records to different Iceberg tables based on the data.\u003c/p\u003e\n"],["\u003cp\u003eThe Java code provided showcases the creation of a Dataflow pipeline that reads JSON data, converts it to Row objects, and then writes it to Iceberg tables, using the "airport" field to determine the destination table name in the format "flights-{airport}".\u003c/p\u003e\n"],["\u003cp\u003eThe code sample includes setting up the Iceberg catalog configuration, including the warehouse location and catalog type, through the specified options at runtime.\u003c/p\u003e\n"],["\u003cp\u003eThe pipeline filters incoming data to only include the fields "name" and "id", as indicated by the "keep" configuration in the Iceberg I/O setup.\u003c/p\u003e\n"],["\u003cp\u003eThis functionality is currently limited to the Java programming language.\u003c/p\u003e\n"]]],[],null,["# Write to Iceberg using dynamic destinations\n\nWrite from Dataflow to Apache Iceberg, using the dynamic destinations feature to route incoming records to different Iceberg tables.\n\n(Note, currently this feature is only supported for Java)\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Write from Dataflow to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)\n\nCode sample\n-----------\n\n### Java\n\n\nTo authenticate to Dataflow, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import com.google.common.collect.ImmutableMap;\n import java.util.Arrays;\n import java.util.List;\n import java.util.Map;\n import org.apache.beam.sdk.Pipeline;\n import org.apache.beam.sdk.PipelineResult;\n import org.apache.beam.sdk.managed.Managed;\n import org.apache.beam.sdk.options.Description;\n import org.apache.beam.sdk.options.PipelineOptions;\n import org.apache.beam.sdk.options.PipelineOptionsFactory;\n import org.apache.beam.sdk.schemas.Schema;\n import org.apache.beam.sdk.transforms.Create;\n import org.apache.beam.sdk.transforms.JsonToRow;\n\n public class ApacheIcebergDynamicDestinations {\n\n // The schema for the table rows.\n public static final Schema SCHEMA = new Schema.Builder()\n .addInt64Field(\"id\")\n .addStringField(\"name\")\n .addStringField(\"airport\")\n .build();\n\n // The data to write to table, formatted as JSON strings.\n static final List\u003cString\u003e TABLE_ROWS = List.of(\n \"{\\\"id\\\":0, \\\"name\\\":\\\"Alice\\\", \\\"airport\\\": \\\"ORD\\\" }\",\n \"{\\\"id\\\":1, \\\"name\\\":\\\"Bob\\\", \\\"airport\\\": \\\"SYD\\\" }\",\n \"{\\\"id\\\":2, \\\"name\\\":\\\"Charles\\\", \\\"airport\\\": \\\"ORD\\\" }\"\n );\n\n public interface Options extends PipelineOptions {\n @Description(\"The URI of the Apache Iceberg warehouse location\")\n String getWarehouseLocation();\n\n void setWarehouseLocation(String value);\n\n @Description(\"The name of the Apache Iceberg catalog\")\n String getCatalogName();\n\n void setCatalogName(String value);\n }\n\n // Write JSON data to Apache Iceberg, using dynamic destinations to determine the Iceberg table\n // where Dataflow writes each record. The JSON data contains a field named \"airport\". The\n // Dataflow pipeline writes to Iceberg tables with the naming pattern \"flights-{airport}\".\n public static void main(String[] args) {\n // Parse the pipeline options passed into the application. Example:\n // --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \\\n // For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options\n Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);\n Pipeline pipeline = Pipeline.create(options);\n\n // Configure the Iceberg source I/O\n Map catalogConfig = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"warehouse\", options.getWarehouseLocation())\n .put(\"type\", \"hadoop\")\n .build();\n\n ImmutableMap\u003cString, Object\u003e config = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"catalog_name\", options.getCatalogName())\n .put(\"catalog_properties\", catalogConfig)\n // Route the incoming records based on the value of the \"airport\" field.\n .put(\"table\", \"flights-{airport}\")\n // Specify which fields to keep from the input data.\n .put(\"keep\", Arrays.asList(\"name\", \"id\"))\n .build();\n\n // Build the pipeline.\n pipeline\n // Read in-memory JSON data.\n .apply(Create.of(TABLE_ROWS))\n // Convert the JSON records to Row objects.\n .apply(JsonToRow.withSchema(SCHEMA))\n // Write each Row to Apache Iceberg.\n .apply(Managed.write(Managed.ICEBERG).withConfig(config));\n\n // Run the pipeline.\n pipeline.run().waitUntilFinish();\n }\n }\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=dataflow)."]]