[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Pub/Sub to BigQuery best practices\n\nThis page outlines best practices for optimizing a Dataflow\npipeline that [reads from\nPub/Sub](/dataflow/docs/concepts/streaming-with-cloud-pubsub) and\n[writes to BigQuery](/dataflow/docs/guides/write-to-bigquery).\nDepending on your use case, the following suggestions might lead to better\nperformance.\n\nInitial solutions for pipeline backlogs\n---------------------------------------\n\nWhen a Pub/Sub to BigQuery pipeline experiences a\ngrowing backlog and can't keep up with incoming messages, you can take the\nfollowing immediate steps:\n\n- **Increase Pub/Sub acknowledgment deadline:** For the associated Pub/Sub subscription, [increase the acknowledgment\n deadline](/pubsub/docs/lease-management) to a value slightly longer than the maximum expected message processing time. This prevents messages from being redelivered prematurely while they are still being processed.\n- **Scale out workers:** If the unacknowledged message count and subscription backlog are growing rapidly, the pipeline's processing capacity is likely insufficient. Increase the [number of Dataflow\n workers](/dataflow/docs/reference/pipeline-options) to handle the message volume.\n- **Enable exponential backoff:** Enable [exponential backoff](/pubsub/docs/handling-failures#exponential_backoff) to improve how the pipeline handles retries for transient issues, making it more resilient.\n\nLong-term code and pipeline optimizations\n-----------------------------------------\n\nFor sustained performance and stability, the following architectural and code\nchanges are recommended:\n\n- **Reduce `getTable` calls to BigQuery:** Excessive `getTable` method calls can lead to rate-limiting and performance bottlenecks. To mitigate this:\n - Cache table existence information in worker memory to avoid repeated calls for the same table.\n - Batch `getTable` calls on a per-bundle basis instead of for each individual element.\n - Refactor the pipeline code to eliminate the need to check for table existence for every message.\n- **Use the BigQuery Storage Write API:** For streaming pipelines writing to BigQuery, migrate from standard streaming inserts to the [Storage Write API](/bigquery/docs/write-api). The Storage Write API offers better performance and significantly higher quotas.\n- **Use the legacy Dataflow runner for high-cardinality jobs:** For jobs that process a very large number of unique keys (*high-cardinality*), the legacy Dataflow runner might offer better performance than Runner v2, unless cross-language transforms are required.\n- **Optimize the key space:** Performance can degrade when pipelines operate on millions of active keys. Adjust the pipeline's logic to perform work on a smaller, more manageable key space.\n\nResource, quota, and configuration management\n---------------------------------------------\n\nProper resource allocation and configuration are critical for pipeline health:\n\n- **Proactively manage quotas:** Monitor quotas and request increases for any quotas that might be reached during scaling events. For example, consider the following scaling events:\n - A high rate of calls to the `TableService.getTable` or `tabledata.insertAll` methods might exceed the maximum queries per second (QPS). For more information on limits and how to request more quota, see [BigQuery quotas and\n limits](/bigquery/quotas).\n - Compute Engine quotas for in-use IP addresses and CPUs might exceed maximum limits. For more information on limits and how to request more quota, see the [Compute Engine quota and limits\n overview](/compute/quotas-limits).\n- **Optimize worker configuration:** To prevent out-of-memory (OOM) errors and improve stability:\n - Use worker [machine\n types](/dataflow/docs/guides/configure-worker-vm#machine-type) with more memory.\n - Reduce the [number of\n threads](/dataflow/docs/reference/pipeline-options#resource_utilization) per worker.\n - Set a higher [number of\n workers](/dataflow/docs/reference/pipeline-options#resource_utilization) to distribute the workload more evenly and reduce the performance impact of frequent autoscaling events.\n\nWhat's next\n-----------\n\n- [Develop and test Dataflow pipelines](/dataflow/docs/guides/develop-and-test-pipelines)\n- [Dataflow pipeline best practices](/dataflow/docs/guides/pipeline-best-practices)\n- [Dataflow job metrics](/dataflow/docs/guides/using-monitoring-intf)"]]