如要為 Ray 叢集啟用記錄檔收集功能,您必須先在現有的 GKE 叢集上啟用系統和工作負載記錄功能。
如果您在現有 GKE 叢集上啟用 Ray 叢集的記錄收集功能,GKE 只會收集新建立的 Ray Pod 記錄,不會收集現有 Ray Pod 的記錄。
如果是 Standard GKE 叢集,您必須啟用 Google Cloud Managed Service for Prometheus,才能啟用 Ray 叢集的指標收集功能。對於 Autopilot 叢集,Google Cloud Managed Service for Prometheus 預設為啟用。
您「不得」在 Ray 叢集中的任何 Ray 容器中指定名為 ray-logs 的磁碟區。否則 GKE 不會收集記錄。
啟用 Ray 叢集的記錄收集功能
您可以透過新的或現有的 Autopilot 或 Standard GKE 叢集,啟用 Ray 叢集的記錄收集功能。GKE 從 Ray 叢集收集的 Ray 記錄檔會歸類為容器記錄檔。這包括 Ray 叢集標頭和工作節點產生的所有記錄。
您可以使用 Google Cloud 控制台
或 gcloud CLI,為 Ray 叢集啟用記錄收集功能。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Collect and view logs and metrics for Ray clusters on Google Kubernetes Engine (GKE)\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows how to configure Google Kubernetes Engine (GKE) to collect logs\nand metrics for Ray clusters running on Google Kubernetes Engine (GKE), plus how to\nview Ray logs and metrics in Cloud Logging and Cloud Monitoring.\n\nFor more\ninformation on Ray and KubeRay, see\n[Ray on Google Kubernetes Engine (GKE) overview](/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- [Enable the Ray operator for Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke).\n\n### Requirements and limitations\n\n- You must enable system and workload logging on an existing GKE cluster before you enable log collection for Ray clusters.\n- If you enable log collection for Ray clusters on an existing GKE cluster, GKE only collects logs from newly created Ray Pods, not from existing Ray Pods.\n- For Standard GKE clusters, you must enable Google Cloud Managed Service for Prometheus to enable metrics collection for Ray clusters. For Autopilot clusters, Google Cloud Managed Service for Prometheus is enabled by default.\n- You must **not** specify a volume named `ray-logs` in any Ray container in the Ray cluster. Otherwise, GKE won't collect logs.\n\nEnable log collection for a Ray cluster\n---------------------------------------\n\nYou can enable log collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters. The Ray\nlogs that GKE collects from Ray clusters are classified as\ncontainer logs. This includes all logs produced by the Ray cluster header and\nworker nodes.\n\nYou can enable log collection for Ray clusters using the Google Cloud console\nor the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable log collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-logging\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option.\n| **Note:** You might observe that, in GKE, the Ray Operator collects logs from Ray head and worker Pods (standard output and standard error) even when the **Enable log collection for Ray clusters** option is not selected. This behavior is expected because GKE, by default, automatically collects all workload logs written to standard output or standard error. The **Enable log collection for Ray clusters** checkbox specifically controls the collection of additional Ray-specific logs, separate from these default workload logs. To manage which logs are sent to Cloud Logging by default and to reduce logging volume, refer to the [About GKE logs](/kubernetes-engine/docs/concepts/about-logs#what_logs) page.\n\nView Ray logs\n-------------\n\nYou can view logs collected from Ray clusters running on GKE\nusing Logging.\n\n1. Go to the **Cloud Logging** page in the Google Cloud console.\n\n [Go to Cloud Logging](https://console.cloud.google.com/logs)\n2. Open the query editor and paste your expression into the query editor\n\n3. Click **Run query**\n\nYou can use the following examples queries in the Logs Explorer:\n\nEnable metrics collection for a Ray cluster\n-------------------------------------------\n\nYou can enable metrics collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters.\n\nAfter you enable metrics collection for Ray clusters, GKE\ncollects metrics from existing Ray clusters and new Ray clusters.\nGKE collects all system metrics exported by Ray in Prometheus\nformat.\n\nYou can enable metrics collection for Ray clusters using the\nGoogle Cloud console or the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable metrics collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-monitoring\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option.\n\nView Ray metrics\n----------------\n\nGoogle Cloud Managed Service for Prometheus provides a pre-configured\n**Ray on GKE Overview** dashboard that offers a centralized view\nof key Ray metrics. This is the recommended way\nto quickly get started with monitoring your Ray clusters on GKE.\n\n[Go to Ray on GKE Overview dashboard](https://console.cloud.google.com/monitoring/dashboards/integration/kuberay.ray-overview)\n\nThe dashboard is automatically populated when you [enable\nmetrics collection](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#enable-metrics-collection) for your Ray cluster.\n\nAlternatively, if you want to explore individual metrics collected from Ray\nclusters running on GKE, follow these steps:\n\n1. Go to the **Metrics Explorer** page in the Google Cloud console.\n\n [Go to Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer)\n2. In the **Select a metric** field, you can search for Ray-specific metrics.\n These metrics are typically prefixed with `prometheus/ray_`. Examples include\n `prometheus/ray_worker_cpu_seconds_total` or `prometheus/ray_memory_bytes_max`.\n\n3. You can further refine your search by selecting the appropriate resource type\n (for example, `k8s_pod`, `k8s_container`) and filtering by labels relevant to\n your Ray cluster (for example, `ray.io/cluster`).\n\nWhat's next\n-----------\n\n- Learn about [Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).\n- Explore the [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html)."]]