將應用程式部署至 Google Cloud的正式環境後,您可能需要修改應用程式使用的基礎架構。舉例來說,您可能需要變更 VM 的機器類型,或變更 Cloud Storage 值區的儲存空間類別。Google Cloud 基礎架構可靠性指南的這一部分將概述變更管理指南,您可以遵循這些指南,降低基礎架構資源的可靠性風險。本節也會說明如何監控 Google Cloud 基礎架構的可用性。
逐步部署基礎架構變更
如需變更 Google Cloud 基礎架構,請盡可能逐步將變更部署至正式環境。舉例來說,如果您需要變更 VM 的機器類型,請將變更部署至一個區域中的幾個 VM,然後監控變更的效果。如果您發現任何問題,請盡快將基礎架構還原為先前的穩定狀態。診斷並解決問題,然後重新啟動漸進式部署程序。確認工作負載運作正常後,請逐步在整個基礎架構中部署變更。
您可以使用Google Cloud Service Health 資訊主頁,監控所有區域的 Google Cloud 服務目前狀態。您也可以查看每項服務的基礎架構失敗 (稱為「事件」) 記錄。記錄頁面會提供每個事件的詳細資料,例如事件持續時間、受影響的區域和區域、受影響的服務,以及任何建議的解決方法。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2024-11-20 (世界標準時間)。"],[[["\u003cp\u003eDeploy infrastructure changes to production progressively, starting with a small subset of resources and monitoring the effects before expanding.\u003c/p\u003e\n"],["\u003cp\u003eExercise rigorous control over changes to global resources like VPC networks and global load balancers, as they can become single points of failure.\u003c/p\u003e\n"],["\u003cp\u003eUse the Google Cloud Service Health Dashboard to monitor the current status of Google Cloud services and view historical infrastructure failures.\u003c/p\u003e\n"],["\u003cp\u003eUtilize Personalized Service Health to view incidents relevant to your specific project and set up alerts, and use the provided API to access incident information.\u003c/p\u003e\n"]]],[],null,["# Manage and monitor your Google Cloud infrastructure\n\nAfter you deploy an application to production in Google Cloud, you might\nneed to modify the infrastructure that it uses. For example, you might need to\nchange the machine types of your VMs or change the storage class of the\nCloud Storage buckets. This part of the\n[Google Cloud infrastructure reliability guide](/architecture/infra-reliability-guide)\nsummarizes change-management guidelines that you can follow to reduce the\nreliability risk of the infrastructure resources. This part also describes how\nyou can monitor the availability of Google Cloud infrastructure.\n\nDeploy infrastructure changes progressively\n-------------------------------------------\n\nWhen you need to change your Google Cloud infrastructure, as much as\npossible, deploy the changes to production progressively. For example, if you\nneed to change the machine types of the VMs, deploy the changes to a few VMs in\none zone, and monitor the effects of the changes. If you observe any issues,\nrevert the infrastructure quickly to the previous stable state. Diagnose and\nresolve the issues, and then restart the progressive deployment process. After\nverifying that your workload runs as expected, gradually deploy the changes\nacross all of your infrastructure.\n\nControl changes to global resources\n-----------------------------------\n\nWhen you modify global resources such as VPC networks and global load\nbalancers, take extra care to verify the changes before deploying them to\nproduction.\n\nBecause global resources are resilient to zone and\n[region](/docs/geography-and-regions#regions_and_zones)\noutages, you might\ndecide to use single instances of certain global resources in your architecture.\nIn such deployments, the global resources can become single points of failure. For example, if you\ninadvertently misconfigure a forwarding rule of your global load balancer, the\nfrontend can stop receiving or processing user requests. Effectively, the\napplication is unavailable to users in this case though the backend is intact.\nTo avoid such situations, exercise rigorous control over changes to global\nresources. For example, in your change-review process, you can classify any\nmodifications to global resources as high-risk changes that additional reviewers\nmust verify and approve.\n\nMonitor availability of Google Cloud infrastructure\n---------------------------------------------------\n\nYou can monitor the current status of the Google Cloud services across\nall the regions by using the\n[Google Cloud Service Health Dashboard](https://status.cloud.google.com/).\nYou can also view a\n[history](https://status.cloud.google.com/summary)\nof the infrastructure failures (called *incidents*) for each service. The\nhistory page provides the details of each incident, such as the incident\nduration, affected zones and regions, affected services, and any recommended\nworkarounds.\n\nYou can also view incidents relevant to your project using\n[Personalized Service Health](https://console.cloud.google.com/servicehealth/incidents).\nService Health also lets you request incident information using an API on a\nper-project or per-organization basis and lets you configure alerts.\n\nGoogle provides regular updates about the status of each incident, including an\nestimated time for the next update. You can programmatically get status updates\nfor incidents by using an RSS feed. For more information, see\n[Incidents and the Google Cloud Service Health Dashboard](/support/docs/dashboard).\n| **Note:** Even when there's no infrastructure outage, your application might be unavailable due to errors in the application or configuration issues. For example, a software update might have caused the app servers to crash, or an administrator might have inadvertently deleted the load balancer forwarding rules. For help with troubleshooting issues with specific Google Cloud resources, see the documentation for the appropriate service."]]