Membaca dari Apache Iceberg
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Menggunakan transformasi I/O terkelola Dataflow untuk membaca dari Apache Iceberg
Mempelajari lebih lanjut
Untuk dokumentasi mendetail yang menyertakan contoh kode ini, lihat artikel berikut:
Contoh kode
Kecuali dinyatakan lain, konten di halaman ini dilisensikan berdasarkan Lisensi Creative Commons Attribution 4.0, sedangkan contoh kode dilisensikan berdasarkan Lisensi Apache 2.0. Untuk mengetahui informasi selengkapnya, lihat Kebijakan Situs Google Developers. Java adalah merek dagang terdaftar dari Oracle dan/atau afiliasinya.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis page details how to use the Dataflow managed I/O transform to read data from Apache Iceberg.\u003c/p\u003e\n"],["\u003cp\u003eThe provided code sample demonstrates reading data from an Apache Iceberg table using Java and Dataflow.\u003c/p\u003e\n"],["\u003cp\u003eThe example requires setting up Application Default Credentials for authentication.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample configures the Iceberg source I/O by specifying the warehouse location, catalog type, catalog name, and table name.\u003c/p\u003e\n"],["\u003cp\u003eThe code uses a Dataflow pipeline to read from Iceberg, format records, and write them to a text file.\u003c/p\u003e\n"]]],[],null,["# Read from Apache Iceberg\n\nUse the Dataflow managed I/O transform to read from Apache Iceberg\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Read from Apache Iceberg to Dataflow](/dataflow/docs/guides/read-from-iceberg)\n\nCode sample\n-----------\n\n### Java\n\n\nTo authenticate to Dataflow, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import com.google.common.collect.ImmutableMap;\n import java.util.Map;\n import org.apache.beam.sdk.Pipeline;\n import org.apache.beam.sdk.io.TextIO;\n import org.apache.beam.sdk.managed.Managed;\n import org.apache.beam.sdk.options.Description;\n import org.apache.beam.sdk.options.PipelineOptions;\n import org.apache.beam.sdk.options.PipelineOptionsFactory;\n import org.apache.beam.sdk.transforms.MapElements;\n import org.apache.beam.sdk.values.PCollectionRowTuple;\n import org.apache.beam.sdk.values.TypeDescriptors;\n\n public class ApacheIcebergRead {\n\n static final String CATALOG_TYPE = \"hadoop\";\n\n public interface Options extends PipelineOptions {\n @Description(\"The URI of the Apache Iceberg warehouse location\")\n String getWarehouseLocation();\n\n void setWarehouseLocation(String value);\n\n @Description(\"Path to write the output file\")\n String getOutputPath();\n\n void setOutputPath(String value);\n\n @Description(\"The name of the Apache Iceberg catalog\")\n String getCatalogName();\n\n void setCatalogName(String value);\n\n @Description(\"The name of the table to write to\")\n String getTableName();\n\n void setTableName(String value);\n }\n\n public static void main(String[] args) {\n\n // Parse the pipeline options passed into the application. Example:\n // --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \\\n // --tableName= $TABLE_NAME --outputPath=$OUTPUT_FILE\n // For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options\n Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);\n Pipeline pipeline = Pipeline.create(options);\n\n // Configure the Iceberg source I/O\n Map catalogConfig = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"warehouse\", options.getWarehouseLocation())\n .put(\"type\", CATALOG_TYPE)\n .build();\n\n ImmutableMap\u003cString, Object\u003e config = ImmutableMap.\u003cString, Object\u003ebuilder()\n .put(\"table\", options.getTableName())\n .put(\"catalog_name\", options.getCatalogName())\n .put(\"catalog_properties\", catalogConfig)\n .build();\n\n // Build the pipeline.\n pipeline.apply(Managed.read(Managed.ICEBERG).withConfig(config))\n .getSinglePCollection()\n // Format each record as a string with the format 'id:name'.\n .apply(MapElements\n .into(TypeDescriptors.strings())\n .via((row -\u003e {\n return String.format(\"%d:%s\",\n row.getInt64(\"id\"),\n row.getString(\"name\"));\n })))\n // Write to a text file.\n .apply(\n TextIO.write()\n .to(options.getOutputPath())\n .withNumShards(1)\n .withSuffix(\".txt\"));\n\n pipeline.run().waitUntilFinish();\n }\n }\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=dataflow)."]]