[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Collect and view logs and metrics for Ray clusters on Google Kubernetes Engine (GKE)\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows how to configure Google Kubernetes Engine (GKE) to collect logs\nand metrics for Ray clusters running on Google Kubernetes Engine (GKE), plus how to\nview Ray logs and metrics in Cloud Logging and Cloud Monitoring.\n\nFor more\ninformation on Ray and KubeRay, see\n[Ray on Google Kubernetes Engine (GKE) overview](/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- [Enable the Ray operator for Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke).\n\n### Requirements and limitations\n\n- You must enable system and workload logging on an existing GKE cluster before you enable log collection for Ray clusters.\n- If you enable log collection for Ray clusters on an existing GKE cluster, GKE only collects logs from newly created Ray Pods, not from existing Ray Pods.\n- For Standard GKE clusters, you must enable Google Cloud Managed Service for Prometheus to enable metrics collection for Ray clusters. For Autopilot clusters, Google Cloud Managed Service for Prometheus is enabled by default.\n- You must **not** specify a volume named `ray-logs` in any Ray container in the Ray cluster. Otherwise, GKE won't collect logs.\n\nEnable log collection for a Ray cluster\n---------------------------------------\n\nYou can enable log collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters. The Ray\nlogs that GKE collects from Ray clusters are classified as\ncontainer logs. This includes all logs produced by the Ray cluster header and\nworker nodes.\n\nYou can enable log collection for Ray clusters using the Google Cloud console\nor the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable log collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-logging\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option.\n| **Note:** You might observe that, in GKE, the Ray Operator collects logs from Ray head and worker Pods (standard output and standard error) even when the **Enable log collection for Ray clusters** option is not selected. This behavior is expected because GKE, by default, automatically collects all workload logs written to standard output or standard error. The **Enable log collection for Ray clusters** checkbox specifically controls the collection of additional Ray-specific logs, separate from these default workload logs. To manage which logs are sent to Cloud Logging by default and to reduce logging volume, refer to the [About GKE logs](/kubernetes-engine/docs/concepts/about-logs#what_logs) page.\n\nView Ray logs\n-------------\n\nYou can view logs collected from Ray clusters running on GKE\nusing Logging.\n\n1. Go to the **Cloud Logging** page in the Google Cloud console.\n\n [Go to Cloud Logging](https://console.cloud.google.com/logs)\n2. Open the query editor and paste your expression into the query editor\n\n3. Click **Run query**\n\nYou can use the following examples queries in the Logs Explorer:\n\nEnable metrics collection for a Ray cluster\n-------------------------------------------\n\nYou can enable metrics collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters.\n\nAfter you enable metrics collection for Ray clusters, GKE\ncollects metrics from existing Ray clusters and new Ray clusters.\nGKE collects all system metrics exported by Ray in Prometheus\nformat.\n\nYou can enable metrics collection for Ray clusters using the\nGoogle Cloud console or the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable metrics collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-monitoring\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option.\n\nView Ray metrics\n----------------\n\nGoogle Cloud Managed Service for Prometheus provides a pre-configured\n**Ray on GKE Overview** dashboard that offers a centralized view\nof key Ray metrics. This is the recommended way\nto quickly get started with monitoring your Ray clusters on GKE.\n\n[Go to Ray on GKE Overview dashboard](https://console.cloud.google.com/monitoring/dashboards/integration/kuberay.ray-overview)\n\nThe dashboard is automatically populated when you [enable\nmetrics collection](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#enable-metrics-collection) for your Ray cluster.\n\nAlternatively, if you want to explore individual metrics collected from Ray\nclusters running on GKE, follow these steps:\n\n1. Go to the **Metrics Explorer** page in the Google Cloud console.\n\n [Go to Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer)\n2. In the **Select a metric** field, you can search for Ray-specific metrics.\n These metrics are typically prefixed with `prometheus/ray_`. Examples include\n `prometheus/ray_worker_cpu_seconds_total` or `prometheus/ray_memory_bytes_max`.\n\n3. You can further refine your search by selecting the appropriate resource type\n (for example, `k8s_pod`, `k8s_container`) and filtering by labels relevant to\n your Ray cluster (for example, `ray.io/cluster`).\n\nWhat's next\n-----------\n\n- Learn about [Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).\n- Explore the [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html)."]]