호스트 유지보수 이벤트는 일반적으로 2주에 한 번 발생하지만 더 자주 실행될 수 있습니다.
이 문서에서는 유지보수 이벤트 중 워크로드 중단을 최소화하는 방법을 설명합니다.
유지보수 이벤트 전에 사전 알림 수신
가상 머신(VM) 인스턴스의 유지보수 일정을 모니터링하고 시스템 재시작을 통해 워크로드가 전환되도록 준비할 수 있습니다.
호스트 이벤트에 대한 사전 알림을 받으려면 /computeMetadata/v1/instance/maintenance-event 메타데이터 값을 모니터링합니다.
메타데이터 서버 요청으로 NONE이 반환되면 VM이 중지되도록 예약되지 않은 것입니다. 예를 들어 VM 내에서 다음 명령어를 실행합니다.
메타데이터 서버가 TERMINATE_ON_HOST_MAINTENANCE를 반환하면 VM이 중지되도록 예약된 것입니다. Compute Engine에서는 GPU VM에 1시간 전에 알림을 전송하지만 일반적인 VM의 경우에는 중지 60초 전에 알림을 전송합니다. 애플리케이션이 유지보수 이벤트 중에 전환되도록 구성합니다. 예를 들어 다음 방법 중 하나를 사용할 수 있습니다.
진행 중인 작업을 Cloud Storage 버킷으로 임시 이전한 후 VM이 다시 시작된 후에 해당 데이터를 검색하도록 애플리케이션을 구성합니다.
보조 영구 디스크에 데이터를 씁니다.
VM이 자동으로 다시 시작되면 영구 디스크를 다시 연결할 수 있으며 애플리케이션이 작업을 다시 시작할 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eVMs with attached GPUs must be stopped during Compute Engine maintenance events because they cannot be live migrated.\u003c/p\u003e\n"],["\u003cp\u003eYou must configure these GPU-attached VMs to stop for host maintenance events, with the option to automatically restart afterward.\u003c/p\u003e\n"],["\u003cp\u003eData on Local SSD disks attached to GPU VMs is unrecoverable if the VM is restarted during a host maintenance event.\u003c/p\u003e\n"],["\u003cp\u003eYou can monitor the \u003ccode\u003e/computeMetadata/v1/instance/maintenance-event\u003c/code\u003e metadata value to receive advance notice of host maintenance events, with GPU VMs receiving a 1-hour notice to prepare for shutdown.\u003c/p\u003e\n"],["\u003cp\u003eTo minimize disruptions, you can temporarily move in-progress work to Cloud Storage or write data to a secondary Persistent Disk, ensuring it is retrievable after the VM restarts.\u003c/p\u003e\n"]]],[],null,["# Handle GPU host maintenance events\n\n*** ** * ** ***\n\nWhen Compute Engine performs [maintenance](/compute/docs/instances/host-maintenance-overview#maintenanceevents) on a virtual machine (VM) with\n[attached graphics processing units (GPUs)](/compute/docs/gpus/about-gpus),\nthe VM must be stopped. This is because VMs with attached GPUs\ncan't be\n[live migrated](/compute/docs/instances/live-migration-process#limitations).\n\nYou must set these VMs to\n[stop for host maintenance events](/compute/docs/instances/host-maintenance-overview#terminate_and_optionally_restart).\nYou can set your stopped VMs to\n[automatically restart](/compute/docs/instances/host-maintenance-overview#autorestart)\nafter the maintenance event completes.\n| **Warning:** For VMs with GPUs, data on any Local SSD disks attached to the VM is unrecoverable if Compute Engine restarts the VM for [host maintenance events](/compute/docs/gpus/gpu-host-maintenance).\n\nHost maintenance events typically occur once every two weeks, but might occasionally run more frequently.\n\nThis document discusses how you can minimize disruptions to your workloads during a maintenance event.\n| **Note:** VMs with attached GPUs can take up to one hour to terminate after failures or [host errors](/compute/docs/faq#hosterror).\n\nReceive advance notice before maintenance events\n------------------------------------------------\n\nYou can\nmonitor the maintenance schedule for your virtual machine (VM) instance, and\nprepare your workloads to transition through the system restart.\n\nTo receive advance notice of host events, monitor the\n`/computeMetadata/v1/instance/maintenance-event` metadata value.\nIf the request to the metadata server returns `NONE`, then the VM isn't\nscheduled to stop. For example, run the following command from within a VM: \n\n```\ncurl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H \"Metadata-Flavor: Google\"\n\nNONE\n```\n\nIf the metadata server returns `TERMINATE_ON_HOST_MAINTENANCE`, then your\nVM is scheduled for stopping. Compute Engine gives GPU\nVMs a 1-hour stopping notice, while normal VMs receive only\na 60-second notice. Configure your application to transition through the\nmaintenance event. For example, you might use one of the following techniques:\n\n- Configure your application to temporarily move work in progress to a\n [Cloud Storage bucket](/storage/docs/uploading-objects), then retrieve\n that data after the VM restarts.\n\n- Write data to a\n [secondary Persistent Disk](/compute/docs/disks/add-persistent-disk).\n When the VM automatically restarts, the Persistent Disk can be\n reattached and your application can resume work.\n\nWhat's next?\n------------\n\n- Learn more about [GPU platforms](/compute/docs/gpus).\n- To learn more about managing and scaling groups of VMs, see [Set the group's target size](/compute/docs/instance-groups/add-remove-vms-in-mig#set_the_groups_target_size).\n- To monitor GPU performance, see [Monitoring GPU performance](/compute/docs/gpus/monitor-gpus).\n- To improve network performance, see [Use higher network bandwidth](/compute/docs/gpus/optimize-gpus).\n- Learn how to [troubleshoot VM shutdowns and reboots](/compute/docs/troubleshooting/troubleshooting-reboots)."]]