Questa pagina mostra come configurare Google Kubernetes Engine (GKE) per raccogliere log e metriche per i cluster Ray in esecuzione su Google Kubernetes Engine (GKE, oltre a come visualizzare i log e le metriche di Ray in Cloud Logging e Cloud Monitoring.
Se vuoi utilizzare Google Cloud CLI per questa attività,
installala e poi
inizializza
gcloud CLI. Se hai già installato gcloud CLI, scarica l'ultima versione
eseguendo gcloud components update.
Devi abilitare il logging di sistema e dei workload su un cluster GKE esistente prima di abilitare la raccolta dei log per i cluster Ray.
Se abiliti la raccolta dei log per i cluster Ray su un cluster GKE esistente, GKE raccoglie solo i log dei pod Ray appena creati, non quelli dei pod Ray esistenti.
Per i cluster GKE Standard, devi abilitare
Google Cloud Managed Service per Prometheus per abilitare la raccolta delle metriche per i cluster Ray. Per i cluster Autopilot, Google Cloud Managed Service per Prometheus è abilitato per impostazione predefinita.
Non devi specificare un volume denominato ray-logs in nessun container Ray nel cluster Ray. In caso contrario, GKE non raccoglierà i log.
Abilita la raccolta dei log per un cluster Ray
Puoi abilitare la raccolta dei log per i cluster Ray con cluster GKE Autopilot o standard nuovi o esistenti. I log di Ray
che GKE raccoglie dai cluster Ray sono classificati come
log dei container. Sono inclusi tutti i log generati dall'intestazione del cluster Ray e dai nodi worker.
Puoi abilitare la raccolta dei log per i cluster Ray utilizzando la console Google Cloud
o gcloud CLI.
Console
Vai alla pagina Google Kubernetes Engine nella console Google Cloud .
LOCATION: la posizione del nuovo cluster, ad esempio us-central1.
Puoi abilitare la raccolta dei log per i cluster Ray su un cluster esistente utilizzando il comando
gcloud container clusters update con l'opzione --addons=RayOperator e l'opzione
--enable-ray-cluster-logging.
Visualizza i log di Ray
Puoi visualizzare i log raccolti dai cluster Ray in esecuzione su GKE
utilizzando Logging.
Vai alla pagina Cloud Logging nella console Google Cloud .
Abilita la raccolta delle metriche per un cluster Ray
Puoi abilitare la raccolta delle metriche per i cluster Ray con cluster GKE Autopilot o Standard nuovi o esistenti.
Dopo aver abilitato la raccolta delle metriche per i cluster Ray, GKE
raccoglie le metriche dai cluster Ray esistenti e dai nuovi cluster Ray.
GKE raccoglie tutte le metriche di sistema esportate da Ray nel formato
Prometheus.
Puoi abilitare la raccolta delle metriche per i cluster Ray utilizzando la
consoleGoogle Cloud o gcloud CLI.
Console
Vai alla pagina Google Kubernetes Engine nella console Google Cloud .
LOCATION: la posizione del nuovo cluster, ad esempio us-central1.
Puoi abilitare la raccolta dei log per i cluster Ray su un cluster esistente utilizzando il comando
gcloud container clusters update con l'opzione --addons=RayOperator e l'opzione
--enable-ray-cluster-monitoring.
Visualizza le metriche di Ray
Google Cloud Managed Service per Prometheus fornisce una dashboard Panoramica di Ray su GKE preconfigurata che offre una visualizzazione centralizzata delle metriche chiave di Ray. Questo è il modo consigliato
per iniziare rapidamente a monitorare i cluster Ray su GKE.
Nel campo Seleziona una metrica, puoi cercare metriche specifiche di Ray.
Queste metriche sono in genere precedute dal prefisso prometheus/ray_. Alcuni esempi sono
prometheus/ray_worker_cpu_seconds_total o prometheus/ray_memory_bytes_max.
Puoi perfezionare ulteriormente la ricerca selezionando il tipo di risorsa appropriato
(ad esempio, k8s_pod, k8s_container) e filtrando in base alle etichette pertinenti
al tuo cluster Ray (ad esempio, ray.io/cluster).
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[],[],null,["# Collect and view logs and metrics for Ray clusters on Google Kubernetes Engine (GKE)\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows how to configure Google Kubernetes Engine (GKE) to collect logs\nand metrics for Ray clusters running on Google Kubernetes Engine (GKE), plus how to\nview Ray logs and metrics in Cloud Logging and Cloud Monitoring.\n\nFor more\ninformation on Ray and KubeRay, see\n[Ray on Google Kubernetes Engine (GKE) overview](/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- [Enable the Ray operator for Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke).\n\n### Requirements and limitations\n\n- You must enable system and workload logging on an existing GKE cluster before you enable log collection for Ray clusters.\n- If you enable log collection for Ray clusters on an existing GKE cluster, GKE only collects logs from newly created Ray Pods, not from existing Ray Pods.\n- For Standard GKE clusters, you must enable Google Cloud Managed Service for Prometheus to enable metrics collection for Ray clusters. For Autopilot clusters, Google Cloud Managed Service for Prometheus is enabled by default.\n- You must **not** specify a volume named `ray-logs` in any Ray container in the Ray cluster. Otherwise, GKE won't collect logs.\n\nEnable log collection for a Ray cluster\n---------------------------------------\n\nYou can enable log collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters. The Ray\nlogs that GKE collects from Ray clusters are classified as\ncontainer logs. This includes all logs produced by the Ray cluster header and\nworker nodes.\n\nYou can enable log collection for Ray clusters using the Google Cloud console\nor the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable log collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-logging\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option.\n| **Note:** You might observe that, in GKE, the Ray Operator collects logs from Ray head and worker Pods (standard output and standard error) even when the **Enable log collection for Ray clusters** option is not selected. This behavior is expected because GKE, by default, automatically collects all workload logs written to standard output or standard error. The **Enable log collection for Ray clusters** checkbox specifically controls the collection of additional Ray-specific logs, separate from these default workload logs. To manage which logs are sent to Cloud Logging by default and to reduce logging volume, refer to the [About GKE logs](/kubernetes-engine/docs/concepts/about-logs#what_logs) page.\n\nView Ray logs\n-------------\n\nYou can view logs collected from Ray clusters running on GKE\nusing Logging.\n\n1. Go to the **Cloud Logging** page in the Google Cloud console.\n\n [Go to Cloud Logging](https://console.cloud.google.com/logs)\n2. Open the query editor and paste your expression into the query editor\n\n3. Click **Run query**\n\nYou can use the following examples queries in the Logs Explorer:\n\nEnable metrics collection for a Ray cluster\n-------------------------------------------\n\nYou can enable metrics collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters.\n\nAfter you enable metrics collection for Ray clusters, GKE\ncollects metrics from existing Ray clusters and new Ray clusters.\nGKE collects all system metrics exported by Ray in Prometheus\nformat.\n\nYou can enable metrics collection for Ray clusters using the\nGoogle Cloud console or the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable metrics collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-monitoring\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option.\n\nView Ray metrics\n----------------\n\nGoogle Cloud Managed Service for Prometheus provides a pre-configured\n**Ray on GKE Overview** dashboard that offers a centralized view\nof key Ray metrics. This is the recommended way\nto quickly get started with monitoring your Ray clusters on GKE.\n\n[Go to Ray on GKE Overview dashboard](https://console.cloud.google.com/monitoring/dashboards/integration/kuberay.ray-overview)\n\nThe dashboard is automatically populated when you [enable\nmetrics collection](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#enable-metrics-collection) for your Ray cluster.\n\nAlternatively, if you want to explore individual metrics collected from Ray\nclusters running on GKE, follow these steps:\n\n1. Go to the **Metrics Explorer** page in the Google Cloud console.\n\n [Go to Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer)\n2. In the **Select a metric** field, you can search for Ray-specific metrics.\n These metrics are typically prefixed with `prometheus/ray_`. Examples include\n `prometheus/ray_worker_cpu_seconds_total` or `prometheus/ray_memory_bytes_max`.\n\n3. You can further refine your search by selecting the appropriate resource type\n (for example, `k8s_pod`, `k8s_container`) and filtering by labels relevant to\n your Ray cluster (for example, `ray.io/cluster`).\n\nWhat's next\n-----------\n\n- Learn about [Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).\n- Explore the [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html)."]]