이 페이지에서는 Google Kubernetes Engine(GKE)에서 실행되는 Ray 클러스터의 로그 및 측정항목을 수집하도록 Google Kubernetes Engine(GKE)을 구성하는 방법과 Cloud Logging 및 Cloud Monitoring에서 Ray 로그 및 측정항목을 보는 방법을 설명합니다.
Ray 클러스터에 대해 로그 수집을 사용 설정하기 전에 기존 GKE 클러스터에서 시스템 및 워크로드 로깅을 사용 설정해야 합니다.
기존 GKE 클러스터에서 Ray 클러스터에 대한 로그 수집을 사용 설정하면 GKE는 기존 Ray 포드가 아닌 새로 생성된 Ray 포드의 로그만 수집합니다.
표준 GKE 클러스터의 경우 Ray 클러스터의 측정항목 수집을 사용 설정하려면 Google Cloud Managed Service for Prometheus를 사용 설정해야합니다. Autopilot 클러스터의 경우 Google Cloud Managed Service for Prometheus가 기본적으로 사용 설정됩니다.
Ray 클러스터의 Ray 컨테이너에 ray-logs라는 볼륨을 배포하면 안 됩니다. 배포하면 GKE에서 로그를 수집하지 않습니다.
Ray 클러스터의 로그 수집 사용 설정
신규 또는 기존 Autopilot 또는 표준 GKE 클러스터가 있는 Ray 클러스터에 대해 로그 수집을 사용 설정할 수 있습니다. GKE가 Ray 클러스터에서 수집하는 Ray 로그는 컨테이너 로그로 분류됩니다. 여기에는 Ray 클러스터 헤더와 작업자 노드에서 생성된 모든 로그가 포함됩니다.
Google Cloud 콘솔 또는 gcloud CLI를 사용하여 Ray 클러스터에 대한 로그 수집을 사용 설정할 수 있습니다.
콘솔
Google Cloud 콘솔에서 Google Kubernetes Engine 페이지로 이동합니다.
--addons=RayOperator 옵션과 --enable-ray-cluster-logging 옵션과 함께 gcloud container clusters update 명령어에를 사용하여 기존 클러스터에서 Ray 클러스터에 대한 로그 수집을 사용하도록 설정할 수 있습니다.
Ray 로그 보기
Logging을 사용하여 GKE에서 실행되는 Ray 클러스터에서 수집된 로그를 볼 수 있습니다.
--addons=RayOperator 옵션과 --enable-ray-cluster-monitoring 옵션과 함께 gcloud container clusters update 명령어에를 사용하여 기존 클러스터에서 Ray 클러스터에 대한 로그 수집을 사용하도록 설정할 수 있습니다.
Ray 측정항목 보기
Google Cloud Managed Service for Prometheus는 사전 구성된 GKE 기반 Ray 개요 대시보드를 제공하여 주요 Ray 측정항목을 중앙에서 볼 수 있도록 지원합니다. GKE 기반 Ray 클러스터를 빠르게 모니터링하려면 이 방법을 사용하는 것이 좋습니다.
측정항목 선택 필드에서 Ray 관련 측정항목을 검색할 수 있습니다.
이러한 측정항목에는 일반적으로 prometheus/ray_라는 프리픽스가 붙습니다. 예를 들면 prometheus/ray_worker_cpu_seconds_total 또는 prometheus/ray_memory_bytes_max입니다.
적절한 리소스 유형(예: k8s_pod, k8s_container)을 선택하고 Ray 클러스터와 관련된 라벨(예: ray.io/cluster)을 기준으로 필터링하여 검색을 더욱 세분화할 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-07-01(UTC)"],[],[],null,["# Collect and view logs and metrics for Ray clusters on Google Kubernetes Engine (GKE)\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows how to configure Google Kubernetes Engine (GKE) to collect logs\nand metrics for Ray clusters running on Google Kubernetes Engine (GKE), plus how to\nview Ray logs and metrics in Cloud Logging and Cloud Monitoring.\n\nFor more\ninformation on Ray and KubeRay, see\n[Ray on Google Kubernetes Engine (GKE) overview](/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- [Enable the Ray operator for Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke).\n\n### Requirements and limitations\n\n- You must enable system and workload logging on an existing GKE cluster before you enable log collection for Ray clusters.\n- If you enable log collection for Ray clusters on an existing GKE cluster, GKE only collects logs from newly created Ray Pods, not from existing Ray Pods.\n- For Standard GKE clusters, you must enable Google Cloud Managed Service for Prometheus to enable metrics collection for Ray clusters. For Autopilot clusters, Google Cloud Managed Service for Prometheus is enabled by default.\n- You must **not** specify a volume named `ray-logs` in any Ray container in the Ray cluster. Otherwise, GKE won't collect logs.\n\nEnable log collection for a Ray cluster\n---------------------------------------\n\nYou can enable log collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters. The Ray\nlogs that GKE collects from Ray clusters are classified as\ncontainer logs. This includes all logs produced by the Ray cluster header and\nworker nodes.\n\nYou can enable log collection for Ray clusters using the Google Cloud console\nor the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable log collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-logging\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option.\n| **Note:** You might observe that, in GKE, the Ray Operator collects logs from Ray head and worker Pods (standard output and standard error) even when the **Enable log collection for Ray clusters** option is not selected. This behavior is expected because GKE, by default, automatically collects all workload logs written to standard output or standard error. The **Enable log collection for Ray clusters** checkbox specifically controls the collection of additional Ray-specific logs, separate from these default workload logs. To manage which logs are sent to Cloud Logging by default and to reduce logging volume, refer to the [About GKE logs](/kubernetes-engine/docs/concepts/about-logs#what_logs) page.\n\nView Ray logs\n-------------\n\nYou can view logs collected from Ray clusters running on GKE\nusing Logging.\n\n1. Go to the **Cloud Logging** page in the Google Cloud console.\n\n [Go to Cloud Logging](https://console.cloud.google.com/logs)\n2. Open the query editor and paste your expression into the query editor\n\n3. Click **Run query**\n\nYou can use the following examples queries in the Logs Explorer:\n\nEnable metrics collection for a Ray cluster\n-------------------------------------------\n\nYou can enable metrics collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters.\n\nAfter you enable metrics collection for Ray clusters, GKE\ncollects metrics from existing Ray clusters and new Ray clusters.\nGKE collects all system metrics exported by Ray in Prometheus\nformat.\n\nYou can enable metrics collection for Ray clusters using the\nGoogle Cloud console or the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable metrics collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-monitoring\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option.\n\nView Ray metrics\n----------------\n\nGoogle Cloud Managed Service for Prometheus provides a pre-configured\n**Ray on GKE Overview** dashboard that offers a centralized view\nof key Ray metrics. This is the recommended way\nto quickly get started with monitoring your Ray clusters on GKE.\n\n[Go to Ray on GKE Overview dashboard](https://console.cloud.google.com/monitoring/dashboards/integration/kuberay.ray-overview)\n\nThe dashboard is automatically populated when you [enable\nmetrics collection](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#enable-metrics-collection) for your Ray cluster.\n\nAlternatively, if you want to explore individual metrics collected from Ray\nclusters running on GKE, follow these steps:\n\n1. Go to the **Metrics Explorer** page in the Google Cloud console.\n\n [Go to Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer)\n2. In the **Select a metric** field, you can search for Ray-specific metrics.\n These metrics are typically prefixed with `prometheus/ray_`. Examples include\n `prometheus/ray_worker_cpu_seconds_total` or `prometheus/ray_memory_bytes_max`.\n\n3. You can further refine your search by selecting the appropriate resource type\n (for example, `k8s_pod`, `k8s_container`) and filtering by labels relevant to\n your Ray cluster (for example, `ray.io/cluster`).\n\nWhat's next\n-----------\n\n- Learn about [Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).\n- Explore the [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html)."]]