[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Best practices for using Pub/Sub metrics as a scaling signal\n\nIf you use Pub/Sub metrics as a signal to autoscale your pipeline, here\nare some recommendations.\n\nUse more than one signal to autoscale your pipeline\n---------------------------------------------------\n\nDon't use only Pub/Sub metrics to autoscale your pipeline. It\nmight lead to scenarios where you have a single point of failure for your\nautoscaling decisions. Instead, use a combination of signals to trigger\nautoscaling. An example of an additional signal is the client's CPU\nutilization level. This signal can indicate whether the client tasks are\nhandling work and if scaling up can let the client tasks handle more work.\nSome examples of signals from other\nCloud products that you can use for your pipeline are as follows:\n\n- Compute Engine supports autoscaling based on signals such as\n CPU utilization and Monitoring metrics.\n Compute Engine also supports multiple metrics and multiple signals\n for better reliability.\n\n For more information about scaling with Monitoring metrics, see\n [Scale based on Monitoring metrics](/compute/docs/autoscaler/scaling-cloud-monitoring-metrics).\n For more information about scaling with CPU utilization, see [Scale based on CPU utilization](/compute/docs/autoscaler/scaling-cpu).\n- Google Kubernetes Engine Horizontal Pod autoscaling (HPA) supports\n autoscaling based on resource usage such as CPU and memory usage,\n custom Kubernetes metrics, and external metrics such as\n Monitoring metrics for Pub/Sub.\n It also supports multiple signals.\n\n For more information, see [Horizontal Pod autoscaling](/kubernetes-engine/docs/concepts/horizontalpodautoscaler).\n\nUse the regional version of the metrics instead of global versions\n------------------------------------------------------------------\n\nPub/Sub offers two versions of each metric typically used with\nautoscaling. Make sure you use the versions with the `by_region` suffix:\n\n- [`subscription/num_unacked_messages_by_region`](/monitoring/api/metrics_gcp_p_z#pubsub/subscription/num_unacked_messages_by_region)\n- [`subscription/oldest_unacked_message_age_by_region`](/monitoring/api/metrics_gcp_p_z#pubsub/subscription/oldest_unacked_message_age_by_region)\n\nDon't use the global versions of these metrics if you want your autoscaling to\nbe resilient to single-region outages. The global version of these metrics\nrequire the calculation of the backlog across all regions known to have\nmessages, which means unavailability in a single region region results in a data\ngap. In contrast, the `by_region` versions of the metrics calculate and report\nthe backlog on a per-region basis. If the backlog cannot be computed for a\nsingle region, the metric still reports values for the other regions.\n\nAvoid using subscriber-side throughput metrics to autoscale subscribers\n-----------------------------------------------------------------------\n\nAvoid using subscriber-side throughput metrics like\n`subscription/ack_message_count` to autoscale subscriber clients. Instead,\nconsider using metrics that directly reflect the backlog of messages waiting\nto be processed, such as the previously mentioned\n`subscription/num_unacked_messages` or\n`subscription/oldest_unacked_message_age`.\n\n**Issues with using subscriber-side throughput metrics for autoscaling**\n\nUsing these metrics can cause problems because they represent the amount\nof traffic between Pub/Sub and subscribers. Scaling based on such\nmetric can create a self-referential loop where a decrease in delivered or\nacknowledged messages leads to scaling down of clients. For example, this\nmight occur if there is a temporary dip in traffic or there is an issue with\none of your subscribers.\n\nIf your clients scale down to zero or near-zero, all on-going subscribe traffic\ncan stop, and subscribers may not be able to process messages, even if new\nmessages arrive. This can result in significant ingestion lag and lead to an\nunrecoverable state for your subscriber clients.\n\nDeal with metrics gaps when they occur\n--------------------------------------\n\nDon't assume that the absence of metrics means that there are no messages to\nprocess. For example, if in response to missing metrics, you scale down\nprocessing tasks to zero, messages already in the backlog or that get published\nduring this time might not be consumed. This increases the end-to-end latency.\nTo minimize latency, set a minimum task count greater than zero so that you\nare always prepared to handle published messages,\neven if recent Pub/Sub metrics indicate an empty queue.\n\nBoth Compute Engine autoscalers and Google Kubernetes Engine HPAs are designed to\nmaintain the current replica count when metrics are unavailable. This provides a\nsafety net if no metrics are available.\n\nYou can also implement\n[Pub/Sub flow control](/pubsub/docs/flow-control) mechanisms\nto help prevent tasks from being overwhelmed if they are unintentionally\ndownscaled due to missing metrics."]]