Write to Apache Iceberg
Stay organized with collections
Save and categorize content based on your preferences.
Use the Dataflow managed I/O transform to write to Apache Iceberg
Explore further
For detailed documentation that includes this code sample, see the following:
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eThe Dataflow managed I/O transform can be used to write data to Apache Iceberg.\u003c/p\u003e\n"],["\u003cp\u003eAuthentication to Dataflow requires setting up Application Default Credentials.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample demonstrates writing to an Iceberg table using a specified schema, catalog, and warehouse location.\u003c/p\u003e\n"],["\u003cp\u003ePipeline options, such as warehouse location, catalog name, and table name, are configured during the pipeline setup.\u003c/p\u003e\n"],["\u003cp\u003eThe provided code sample creates an Apache Beam pipeline that writes JSON data to an Iceberg table using the \u003ccode\u003eManaged.write\u003c/code\u003e transform.\u003c/p\u003e\n"]]],[],null,["Use the Dataflow managed I/O transform to write to Apache Iceberg\n\nExplore further\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Write from Dataflow to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)\n\nCode sample \n\nJava\n\n\nTo authenticate to Dataflow, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import com.google.common.collect.ImmutableMap;\n import java.util.Arrays;\n import java.util.List;\n import java.util.Map;\n import org.apache.beam.sdk.Pipeline;\n import org.apache.beam.sdk.managed.Managed;\n import org.apache.beam.sdk.options.Description;\n import org.apache.beam.sdk.options.PipelineOptions;\n import org.apache.beam.sdk.options.PipelineOptionsFactory;\n import org.apache.beam.sdk.schemas.Schema;\n import org.apache.beam.sdk.transforms.Create;\n import org.apache.beam.sdk.transforms.JsonToRow;\n import org.apache.beam.sdk.values.PCollectionRowTuple;\n\n public class ApacheIcebergWrite {\n static final List\u003cString\u003e TABLE_ROWS = Arrays.asList(\n \"{\\\"id\\\":0, \\\"name\\\":\\\"Alice\\\"}\",\n \"{\\\"id\\\":1, \\\"name\\\":\\\"Bob\\\"}\",\n \"{\\\"id\\\":2, \\\"name\\\":\\\"Charles\\\"}\"\n );\n\n static final String CATALOG_TYPE = \"hadoop\";\n\n // The schema for the table rows.\n public static final Schema SCHEMA = new Schema.Builder()\n .addStringField(\"name\")\n .addInt64Field(\"id\")\n .build();\n\n public interface Options extends PipelineOptions {\n @Description(\"The URI of the Apache Iceberg warehouse location\")\n String getWarehouseLocation();\n\n void setWarehouseLocation(String value);\n\n @Description(\"The name of the Apache Iceberg catalog\")\n String getCatalogName();\n\n void setCatalogName(String value);\n\n @Description(\"The name of the table to write to\")\n String getTableName();\n\n void setTableName(String value);\n }\n\n public static void main(String[] args) {\n\n // Parse the pipeline options passed into the application. Example:\n // --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \\\n // --tableName= $TABLE_NAME\n // For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options\n Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);\n Pipeline pipeline = Pipeline.create(options);\n\n // Configure the Iceberg source I/O\n Map catalogConfig = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"warehouse\", options.getWarehouseLocation())\n .put(\"type\", CATALOG_TYPE)\n .build();\n\n ImmutableMap\u003cString, Object\u003e config = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"table\", options.getTableName())\n .put(\"catalog_name\", options.getCatalogName())\n .put(\"catalog_properties\", catalogConfig)\n .build();\n\n // Build the pipeline.\n pipeline.apply(Create.of(TABLE_ROWS))\n .apply(JsonToRow.withSchema(SCHEMA))\n .apply(Managed.write(Managed.ICEBERG).withConfig(config));\n\n pipeline.run().waitUntilFinish();\n }\n }\n\nWhat's next\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=dataflow)."]]