使用动态目标将数据写入 Iceberg
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
从 Dataflow 写入 Apache Iceberg,并使用动态目标功能将传入记录路由到不同的 Iceberg 表。
(请注意,目前此功能仅适用于 Java)
深入探索
如需查看包含此代码示例的详细文档,请参阅以下内容:
代码示例
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis code sample demonstrates how to write data from Dataflow to Apache Iceberg using the dynamic destinations feature, routing records to different Iceberg tables based on the data.\u003c/p\u003e\n"],["\u003cp\u003eThe Java code provided showcases the creation of a Dataflow pipeline that reads JSON data, converts it to Row objects, and then writes it to Iceberg tables, using the "airport" field to determine the destination table name in the format "flights-{airport}".\u003c/p\u003e\n"],["\u003cp\u003eThe code sample includes setting up the Iceberg catalog configuration, including the warehouse location and catalog type, through the specified options at runtime.\u003c/p\u003e\n"],["\u003cp\u003eThe pipeline filters incoming data to only include the fields "name" and "id", as indicated by the "keep" configuration in the Iceberg I/O setup.\u003c/p\u003e\n"],["\u003cp\u003eThis functionality is currently limited to the Java programming language.\u003c/p\u003e\n"]]],[],null,["# Write to Iceberg using dynamic destinations\n\nWrite from Dataflow to Apache Iceberg, using the dynamic destinations feature to route incoming records to different Iceberg tables.\n\n(Note, currently this feature is only supported for Java)\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Write from Dataflow to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)\n\nCode sample\n-----------\n\n### Java\n\n\nTo authenticate to Dataflow, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import com.google.common.collect.ImmutableMap;\n import java.util.Arrays;\n import java.util.List;\n import java.util.Map;\n import org.apache.beam.sdk.Pipeline;\n import org.apache.beam.sdk.PipelineResult;\n import org.apache.beam.sdk.managed.Managed;\n import org.apache.beam.sdk.options.Description;\n import org.apache.beam.sdk.options.PipelineOptions;\n import org.apache.beam.sdk.options.PipelineOptionsFactory;\n import org.apache.beam.sdk.schemas.Schema;\n import org.apache.beam.sdk.transforms.Create;\n import org.apache.beam.sdk.transforms.JsonToRow;\n\n public class ApacheIcebergDynamicDestinations {\n\n // The schema for the table rows.\n public static final Schema SCHEMA = new Schema.Builder()\n .addInt64Field(\"id\")\n .addStringField(\"name\")\n .addStringField(\"airport\")\n .build();\n\n // The data to write to table, formatted as JSON strings.\n static final List\u003cString\u003e TABLE_ROWS = List.of(\n \"{\\\"id\\\":0, \\\"name\\\":\\\"Alice\\\", \\\"airport\\\": \\\"ORD\\\" }\",\n \"{\\\"id\\\":1, \\\"name\\\":\\\"Bob\\\", \\\"airport\\\": \\\"SYD\\\" }\",\n \"{\\\"id\\\":2, \\\"name\\\":\\\"Charles\\\", \\\"airport\\\": \\\"ORD\\\" }\"\n );\n\n public interface Options extends PipelineOptions {\n @Description(\"The URI of the Apache Iceberg warehouse location\")\n String getWarehouseLocation();\n\n void setWarehouseLocation(String value);\n\n @Description(\"The name of the Apache Iceberg catalog\")\n String getCatalogName();\n\n void setCatalogName(String value);\n }\n\n // Write JSON data to Apache Iceberg, using dynamic destinations to determine the Iceberg table\n // where Dataflow writes each record. The JSON data contains a field named \"airport\". The\n // Dataflow pipeline writes to Iceberg tables with the naming pattern \"flights-{airport}\".\n public static void main(String[] args) {\n // Parse the pipeline options passed into the application. Example:\n // --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \\\n // For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options\n Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);\n Pipeline pipeline = Pipeline.create(options);\n\n // Configure the Iceberg source I/O\n Map catalogConfig = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"warehouse\", options.getWarehouseLocation())\n .put(\"type\", \"hadoop\")\n .build();\n\n ImmutableMap\u003cString, Object\u003e config = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"catalog_name\", options.getCatalogName())\n .put(\"catalog_properties\", catalogConfig)\n // Route the incoming records based on the value of the \"airport\" field.\n .put(\"table\", \"flights-{airport}\")\n // Specify which fields to keep from the input data.\n .put(\"keep\", Arrays.asList(\"name\", \"id\"))\n .build();\n\n // Build the pipeline.\n pipeline\n // Read in-memory JSON data.\n .apply(Create.of(TABLE_ROWS))\n // Convert the JSON records to Row objects.\n .apply(JsonToRow.withSchema(SCHEMA))\n // Write each Row to Apache Iceberg.\n .apply(Managed.write(Managed.ICEBERG).withConfig(config));\n\n // Run the pipeline.\n pipeline.run().waitUntilFinish();\n }\n }\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=dataflow)."]]