Gravar no Apache Iceberg
Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Usar a transformação de E/S gerenciada do Dataflow para gravar no Apache Iceberg
Mais informações
Para ver a documentação detalhada que inclui este exemplo de código, consulte:
Exemplo de código
Exceto em caso de indicação contrária, o conteúdo desta página é licenciado de acordo com a Licença de atribuição 4.0 do Creative Commons, e as amostras de código são licenciadas de acordo com a Licença Apache 2.0. Para mais detalhes, consulte as políticas do site do Google Developers. Java é uma marca registrada da Oracle e/ou afiliadas.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],[],[[["\u003cp\u003eThe Dataflow managed I/O transform can be used to write data to Apache Iceberg.\u003c/p\u003e\n"],["\u003cp\u003eAuthentication to Dataflow requires setting up Application Default Credentials.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample demonstrates writing to an Iceberg table using a specified schema, catalog, and warehouse location.\u003c/p\u003e\n"],["\u003cp\u003ePipeline options, such as warehouse location, catalog name, and table name, are configured during the pipeline setup.\u003c/p\u003e\n"],["\u003cp\u003eThe provided code sample creates an Apache Beam pipeline that writes JSON data to an Iceberg table using the \u003ccode\u003eManaged.write\u003c/code\u003e transform.\u003c/p\u003e\n"]]],[],null,["# Write to Apache Iceberg\n\nUse the Dataflow managed I/O transform to write to Apache Iceberg\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Write from Dataflow to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)\n\nCode sample\n-----------\n\n### Java\n\n\nTo authenticate to Dataflow, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import com.google.common.collect.ImmutableMap;\n import java.util.Arrays;\n import java.util.List;\n import java.util.Map;\n import org.apache.beam.sdk.Pipeline;\n import org.apache.beam.sdk.managed.Managed;\n import org.apache.beam.sdk.options.Description;\n import org.apache.beam.sdk.options.PipelineOptions;\n import org.apache.beam.sdk.options.PipelineOptionsFactory;\n import org.apache.beam.sdk.schemas.Schema;\n import org.apache.beam.sdk.transforms.Create;\n import org.apache.beam.sdk.transforms.JsonToRow;\n import org.apache.beam.sdk.values.PCollectionRowTuple;\n\n public class ApacheIcebergWrite {\n static final List\u003cString\u003e TABLE_ROWS = Arrays.asList(\n \"{\\\"id\\\":0, \\\"name\\\":\\\"Alice\\\"}\",\n \"{\\\"id\\\":1, \\\"name\\\":\\\"Bob\\\"}\",\n \"{\\\"id\\\":2, \\\"name\\\":\\\"Charles\\\"}\"\n );\n\n static final String CATALOG_TYPE = \"hadoop\";\n\n // The schema for the table rows.\n public static final Schema SCHEMA = new Schema.Builder()\n .addStringField(\"name\")\n .addInt64Field(\"id\")\n .build();\n\n public interface Options extends PipelineOptions {\n @Description(\"The URI of the Apache Iceberg warehouse location\")\n String getWarehouseLocation();\n\n void setWarehouseLocation(String value);\n\n @Description(\"The name of the Apache Iceberg catalog\")\n String getCatalogName();\n\n void setCatalogName(String value);\n\n @Description(\"The name of the table to write to\")\n String getTableName();\n\n void setTableName(String value);\n }\n\n public static void main(String[] args) {\n\n // Parse the pipeline options passed into the application. Example:\n // --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \\\n // --tableName= $TABLE_NAME\n // For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options\n Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);\n Pipeline pipeline = Pipeline.create(options);\n\n // Configure the Iceberg source I/O\n Map catalogConfig = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"warehouse\", options.getWarehouseLocation())\n .put(\"type\", CATALOG_TYPE)\n .build();\n\n ImmutableMap\u003cString, Object\u003e config = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"table\", options.getTableName())\n .put(\"catalog_name\", options.getCatalogName())\n .put(\"catalog_properties\", catalogConfig)\n .build();\n\n // Build the pipeline.\n pipeline.apply(Create.of(TABLE_ROWS))\n .apply(JsonToRow.withSchema(SCHEMA))\n .apply(Managed.write(Managed.ICEBERG).withConfig(config));\n\n pipeline.run().waitUntilFinish();\n }\n }\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=dataflow)."]]