Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Untuk menulis data dari Dataflow ke Bigtable, gunakan
konektor I/O Bigtable Apache Beam.
Keparalelan
Paralelisme dikontrol oleh jumlah
node di
cluster Bigtable. Setiap node mengelola satu atau beberapa rentang kunci, walaupun rentang kunci dapat berpindah antar-node sebagai bagian dari load balancing. Untuk informasi selengkapnya,
lihat Memahami performa dalam
dokumentasi Bigtable.
Anda dikenai biaya untuk jumlah node di cluster instance. Lihat Harga Bigtable.
Performa
Tabel berikut menunjukkan metrik performa untuk operasi tulis I/O Bigtable. Beban kerja dijalankan di satu pekerja e2-standard2, menggunakan
Apache Beam SDK 2.48.0 untuk Java. Mereka tidak menggunakan Runner v2.
Metrik ini didasarkan pada pipeline batch sederhana. Pengujian ini dimaksudkan untuk membandingkan performa
antara konektor I/O, dan tidak selalu mewakili pipeline di dunia nyata.
Performa pipeline Dataflow bersifat kompleks, dan merupakan fungsi dari jenis VM, data
yang sedang diproses, performa sumber dan sink eksternal, serta kode pengguna. Metrik didasarkan pada
menjalankan Java SDK, dan tidak mewakili karakteristik performa SDK bahasa
lainnya. Untuk mengetahui informasi selengkapnya, lihat Performa
Beam IO.
Praktik terbaik
Secara umum, hindari penggunaan transaksi. Transaksi tidak dijamin bersifat
idempotent, dan Dataflow dapat memanggilnya beberapa kali karena
percobaan ulang, yang menyebabkan nilai yang tidak terduga.
Satu pekerja Dataflow mungkin memproses data untuk banyak rentang kunci, sehingga menyebabkan penulisan yang tidak efisien ke Bigtable. Menggunakan
GroupByKey untuk mengelompokkan data berdasarkan kunci Bigtable dapat meningkatkan performa tulis secara signifikan.
Jika Anda menulis set data besar ke Bigtable, pertimbangkan untuk memanggil
withFlowControl. Setelan ini secara otomatis membatasi kapasitas traffic ke Bigtable, untuk memastikan server Bigtable memiliki resource yang cukup untuk menayangkan data.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eThe Apache Beam Bigtable I/O connector facilitates writing data from Dataflow to Bigtable, and pre-built Google Dataflow templates can also be used depending on the use case.\u003c/p\u003e\n"],["\u003cp\u003eBigtable cluster nodes dictate parallelism, with each node managing key ranges that can shift during load balancing, and node count directly affects Bigtable costs.\u003c/p\u003e\n"],["\u003cp\u003ePerformance metrics for Bigtable I/O write operations were measured at 65 MBps or 60,000 elements per second using a specific setup, though real-world pipeline performance can vary greatly.\u003c/p\u003e\n"],["\u003cp\u003eAvoid using transactions when writing to Bigtable with Dataflow due to potential issues with idempotency and retries, and use \u003ccode\u003eGroupByKey\u003c/code\u003e for improved write efficiency.\u003c/p\u003e\n"],["\u003cp\u003eUtilizing \u003ccode\u003ewithFlowControl\u003c/code\u003e is advised when writing substantial datasets to Bigtable to automatically manage traffic and prevent Bigtable server overload.\u003c/p\u003e\n"]]],[],null,["# Write from Dataflow to Bigtable\n\nTo write data from Dataflow to Bigtable, use the\nApache Beam [Bigtable I/O connector](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/package-summary.html).\n| **Note:** Depending on your scenario, consider using one of the [Google-provided Dataflow templates](/dataflow/docs/guides/templates/provided-templates). Several of these write to Bigtable.\n\nParallelism\n-----------\n\nParallelism is controlled by the number of\n[nodes](/bigtable/docs/instances-clusters-nodes#nodes) in the\nBigtable cluster. Each node manages one or more key ranges,\nalthough key ranges can move between nodes as part of\n[load balancing](/bigtable/docs/overview#load-balancing). For more information,\nsee [Understand performance](/bigtable/docs/performance) in the\nBigtable documentation.\n\nYou are charged for the number of nodes in your instance's clusters. See\n[Bigtable pricing](/bigtable/pricing).\n\nPerformance\n-----------\n\nThe following table shows performance metrics for Bigtable I/O\nwrite operations. The workloads were run on one `e2-standard2` worker, using\nthe Apache Beam SDK 2.48.0 for Java. They did not use Runner v2.\n\n\nThese metrics are based on simple batch pipelines. They are intended to compare performance\nbetween I/O connectors, and are not necessarily representative of real-world pipelines.\nDataflow pipeline performance is complex, and is a function of VM type, the data\nbeing processed, the performance of external sources and sinks, and user code. Metrics are based\non running the Java SDK, and aren't representative of the performance characteristics of other\nlanguage SDKs. For more information, see [Beam IO\nPerformance](https://beam.apache.org/performance/).\n\n\u003cbr /\u003e\n\nBest practices\n--------------\n\n- In general, avoid using transactions. Transactions aren't guaranteed to be\n idempotent, and Dataflow might invoke them multiple times due\n to retries, causing unexpected values.\n\n- A single Dataflow worker might process data for many key\n ranges, leading to inefficient writes to Bigtable. Using\n `GroupByKey` to group data by Bigtable key can significantly\n improve write performance.\n\n- If you write large datasets to Bigtable, consider calling\n [`withFlowControl`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.Write.html#withFlowControl-boolean-). This setting automatically rate-limits\n traffic to Bigtable, to ensure the Bigtable\n servers have enough resources available to serve data.\n\nWhat's next\n-----------\n\n- Read the [Bigtable I/O connector](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/package-summary.html) documentation.\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates)."]]