Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Untuk membaca data dari Bigtable ke Dataflow, gunakan konektor I/O Bigtable Apache Beam.
Keparalelan
Paralelisme dikontrol oleh jumlah
node di
cluster Bigtable. Setiap node mengelola satu atau beberapa rentang kunci, walaupun rentang kunci dapat berpindah antar-node sebagai bagian dari load balancing. Untuk informasi selengkapnya, lihat Pembacaan dan performa dalam dokumentasi Bigtable.
Anda dikenai biaya untuk jumlah node di cluster instance. Lihat Harga Bigtable.
Performa
Tabel berikut menunjukkan metrik performa untuk operasi baca Bigtable. Beban kerja dijalankan di satu pekerja e2-standard2, menggunakan
Apache Beam SDK 2.48.0 untuk Java. Mereka tidak menggunakan Runner v2.
Metrik ini didasarkan pada pipeline batch sederhana. Pengujian ini dimaksudkan untuk membandingkan performa
antara konektor I/O, dan tidak selalu mewakili pipeline di dunia nyata.
Performa pipeline Dataflow bersifat kompleks, dan merupakan fungsi dari jenis VM, data
yang sedang diproses, performa sumber dan sink eksternal, serta kode pengguna. Metrik didasarkan pada
menjalankan Java SDK, dan tidak mewakili karakteristik performa SDK bahasa
lainnya. Untuk mengetahui informasi selengkapnya, lihat Performa
Beam IO.
Praktik terbaik
Untuk pipeline baru, gunakan konektor BigtableIO, bukan CloudBigtableIO.
Buat profil aplikasi terpisah untuk setiap jenis
pipeline. Profil aplikasi memungkinkan metrik yang lebih baik untuk membedakan traffic
di antara pipeline, baik untuk dukungan maupun untuk melacak penggunaan.
Pantau node Bigtable. Jika Anda mengalami bottleneck performa, periksa apakah resource seperti penggunaan CPU dibatasi dalam Bigtable. Untuk mengetahui informasi selengkapnya, lihat
Pemantauan.
Secara umum, waktu tunggu default disesuaikan dengan baik untuk sebagian besar pipeline. Jika
pipeline streaming tampaknya macet saat membaca dari Bigtable,
coba panggil withAttemptTimeout untuk menyesuaikan waktu tunggu
upaya.
Pertimbangkan untuk mengaktifkan
penskalaan otomatis Bigtable, atau ubah ukuran
cluster Bigtable agar diskalakan dengan ukuran
tugas Dataflow Anda.
Pertimbangkan untuk menetapkan
maxNumWorkers
pada tugas Dataflow untuk membatasi beban pada
cluster Bigtable.
Jika pemrosesan yang signifikan dilakukan pada elemen Bigtable sebelum
pengacakan, waktu tunggu panggilan ke Bigtable mungkin habis. Dalam hal ini, Anda
dapat memanggil withMaxBufferElementCount untuk melakukan buffering
elemen. Metode ini mengonversi operasi baca dari streaming menjadi paginasi,
yang menghindari masalah tersebut.
Jika Anda menggunakan satu cluster Bigtable untuk pipeline streaming dan
batch, dan performa menurun di sisi Bigtable, pertimbangkan untuk menyiapkan replikasi di cluster. Kemudian, pisahkan pipeline batch
dan streaming, sehingga pipeline tersebut membaca dari replika yang berbeda. Untuk mengetahui informasi
selengkapnya, lihat Ringkasan replikasi.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eUse the Apache Beam Bigtable I/O connector to read data from Bigtable to Dataflow, considering Google-provided Dataflow templates as an alternative depending on your specific use case.\u003c/p\u003e\n"],["\u003cp\u003eParallelism in reading Bigtable data is governed by the number of nodes in the Bigtable cluster, with each node managing key ranges.\u003c/p\u003e\n"],["\u003cp\u003ePerformance metrics for Bigtable read operations on one \u003ccode\u003ee2-standard2\u003c/code\u003e worker using Apache Beam SDK 2.48.0 for Java, show a throughput of 180 MBps or 170,000 elements per second for 100M records, 1 kB, and 1 column, noting that real-world pipeline performance may vary.\u003c/p\u003e\n"],["\u003cp\u003eFor new pipelines, use the \u003ccode\u003eBigtableIO\u003c/code\u003e connector instead of \u003ccode\u003eCloudBigtableIO\u003c/code\u003e, and create separate app profiles for each pipeline type for better traffic differentiation and tracking.\u003c/p\u003e\n"],["\u003cp\u003eBest practices for pipeline optimization include monitoring Bigtable node resources, adjusting timeouts as needed, considering Bigtable autoscaling or resizing, and potentially using replication to separate batch and streaming pipelines for improved performance.\u003c/p\u003e\n"]]],[],null,["# Read from Bigtable to Dataflow\n\nTo read data from Bigtable to Dataflow, use the\nApache Beam [Bigtable I/O connector](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/package-summary.html).\n| **Note:** Depending on your scenario, consider using one of the [Google-provided Dataflow templates](/dataflow/docs/guides/templates/provided-templates). Several of these read from Bigtable.\n\nParallelism\n-----------\n\nParallelism is controlled by the number of\n[nodes](/bigtable/docs/instances-clusters-nodes#nodes) in the\nBigtable cluster. Each node manages one or more key ranges,\nalthough key ranges can move between nodes as part of\n[load balancing](/bigtable/docs/overview#load-balancing). For more information,\nsee [Reads and performance](/bigtable/docs/reads#performance) in the\nBigtable documentation.\n\nYou are charged for the number of nodes in your instance's clusters. See\n[Bigtable pricing](/bigtable/pricing).\n\nPerformance\n-----------\n\nThe following table shows performance metrics for Bigtable read\noperations. The workloads were run on one `e2-standard2` worker, using the\nApache Beam SDK 2.48.0 for Java. They did not use Runner v2.\n\n\nThese metrics are based on simple batch pipelines. They are intended to compare performance\nbetween I/O connectors, and are not necessarily representative of real-world pipelines.\nDataflow pipeline performance is complex, and is a function of VM type, the data\nbeing processed, the performance of external sources and sinks, and user code. Metrics are based\non running the Java SDK, and aren't representative of the performance characteristics of other\nlanguage SDKs. For more information, see [Beam IO\nPerformance](https://beam.apache.org/performance/).\n\n\u003cbr /\u003e\n\nBest practices\n--------------\n\n- For new pipelines, use the [`BigtableIO`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.html) connector, not\n `CloudBigtableIO`.\n\n- Create separate [app profiles](/bigtable/docs/app-profiles) for each type of\n pipeline. App profiles enable better metrics for differentiating traffic\n between pipelines, both for support and for tracking usage.\n\n- Monitor the Bigtable nodes. If you experience performance\n bottlenecks, check whether resources such as CPU utilization are constrained\n within Bigtable. For more information, see\n [Monitoring](/bigtable/docs/monitoring-instance).\n\n- In general, the default timeouts are well tuned for most pipelines. If a\n streaming pipeline appears to get stuck reading from Bigtable,\n try calling [`withAttemptTimeout`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.Read.html#withAttemptTimeout-org.joda.time.Duration-) to adjust the attempt\n timeout.\n\n- Consider enabling\n [Bigtable autoscaling](/bigtable/docs/autoscaling), or resize\n the Bigtable cluster to scale with the size of your\n Dataflow jobs.\n\n- Consider setting\n [`maxNumWorkers`](/dataflow/docs/reference/pipeline-options#resource_utilization)\n on the Dataflow job to limit load on the\n Bigtable cluster.\n\n- If significant processing is done on a Bigtable element before\n a shuffle, calls to Bigtable might time out. In that case, you\n can call [`withMaxBufferElementCount`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.Read.html#withMaxBufferElementCount-java.lang.Integer-) to buffer\n elements. This method converts the read operation from streaming to paginated,\n which avoids the issue.\n\n- If you use a single Bigtable cluster for both streaming and\n batch pipelines, and the performance degrades on the Bigtable\n side, consider setting up replication on the cluster. Then separate the batch\n and streaming pipelines, so that they read from different replicas. For more\n information, see [Replication overview](/bigtable/docs/replication-overview).\n\nWhat's next\n-----------\n\n- Read the [Bigtable I/O connector](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigtable/package-summary.html) documentation.\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates)."]]