Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Linux
Dokumen ini menjelaskan cara memecahkan masalah soft lockup vCPU. Soft lockup
terjadi saat vCPU instance mesin virtual (VM) tidak dapat menjalankan tugas baru
selama lebih dari 20 detik. Sebagian besar soft lockup disebabkan oleh bug dalam software aplikasi.
Soft lockup dapat menyebabkan VM tidak merespons dalam waktu singkat, mengganggu akses SSH ke VM, dan memicu waktu tunggu aplikasi atau failover. VM
yang mengalami soft lockup juga mungkin memiliki pemakaian CPU yang sangat tinggi atau
sangat rendah, bergantung pada penyebab pasti soft lockup.
Mengidentifikasi penguncian sementara
Untuk mengidentifikasi apakah VM Anda mengalami soft lockup, lakukan salah satu hal berikut:
Setelah Anda mengidentifikasi bahwa terjadi soft lockup, coba langkah-langkah pemecahan masalah berikut untuk menyelesaikan masalah:
Periksa situs vendor OS Anda untuk mengetahui apakah ada error yang diketahui pada versi OS Anda. Terkadang, Anda mungkin menemukan referensi ke modul kernel tertentu dalam rekaman aktivitas yang menunjukkan fungsi atau operasi tertentu yang terlibat.
Identifikasi apakah soft lockup berulang dengan frekuensi apa pun, seperti bertepatan dengan beban tinggi atau aktivitas tertentu. Jika soft lockup berkorelasi dengan beban tinggi, Anda mungkin perlu mengonfigurasi ulang workload, misalnya dengan menggunakan VM yang lebih besar atau membagi beban di lebih banyak VM.
Periksa apakah soft lockup berkorelasi dengan perubahan pada lingkungan runtime Anda, seperti deployment software baru atau update image OS.
Evaluasi apakah ada
peristiwa pemeliharaan
yang terjadi di sekitar waktu soft lockup, dengan meninjau
log audit untuk log audit peristiwa sistem.
Jika langkah-langkah pemecahan masalah sebelumnya tidak menyelesaikan masalah, ajukan kasus dukungan dan sertakan semua informasi yang Anda kumpulkan dari pemecahan masalah.
Praktik terbaik untuk menghindari soft lockup
Untuk membantu mencegah VM Anda mengalami soft lockup, sebaiknya terapkan praktik terbaik berikut:
Pastikan Anda telah mengonfigurasi komponen redundan yang sesuai untuk sistem Anda, seperti cluster ketersediaan tinggi, untuk menyediakan kemampuan failover jika VM tertentu mengalami soft lockup yang berkepanjangan. Untuk mengetahui informasi selengkapnya, lihat Mendesain sistem yang tangguh.
Uji workload Anda dengan peristiwa pemeliharaan yang disimulasikan untuk mempelajari performa workload Anda selama migrasi langsung (jika diaktifkan), terutama saat pengujian beban.
Jika Anda menjalankan Kernel Linux kustom atau modul kustom di VM, uji perubahan baru di bawah beban sebelum men-deploy-nya ke lingkungan produksi.
Pastikan perubahan kustom Anda tidak membuat Anda tidak memenuhi syarat untuk menerima dukungan
dari vendor OS Anda.
Selalu update sistem operasi Anda. Untuk mengetahui informasi selengkapnya, baca
Detail sistem operasi.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eSoft lockups occur when a VM's vCPU cannot run a new task for over 20 seconds, often due to application software bugs.\u003c/p\u003e\n"],["\u003cp\u003eSoft lockups can cause VMs to become unresponsive, disrupt SSH access, and trigger application timeouts or failovers.\u003c/p\u003e\n"],["\u003cp\u003eSoft lockups can be identified by reviewing serial port output or operating system logs for a soft lockup stack trace, such as \u003ccode\u003ewatchdog: BUG: soft lockup - CPU#3 stuck for 22s!\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eTroubleshooting soft lockups involves checking for OS vendor known issues, identifying patterns with high load or environment changes, and evaluating maintenance events.\u003c/p\u003e\n"],["\u003cp\u003ePreventive measures include using redundant components, compute-optimized machine families for intense workloads, testing with simulated maintenance events, and keeping the OS up to date.\u003c/p\u003e\n"]]],[],null,["# Troubleshooting vCPU soft lockups\n\nLinux\n\n*** ** * ** ***\n\nThis document describes how to troubleshoot vCPU soft lockups. A *soft lockup*\noccurs when a virtual machine (VM) instance's vCPU is unable to run a new task\nfor more than 20 seconds. Most soft lockups are caused by bugs in application\nsoftware.\n\nSoft lockups can cause VMs to become unresponsive for short periods of time,\ndisrupt SSH access to VMs, and trigger application timeouts or failover. VMs\nthat are experiencing a soft lockup might also have unusually high or\nunusually low CPU utilization, depending on the exact cause of the soft lockup.\n\nIdentify soft lockups\n---------------------\n\nTo identify whether your VM is experiencing a soft lockup, do one of the\nfollowing:\n\n- If you previously enabled [serial port output logging](/compute/docs/troubleshooting/viewing-serial-port-output#enable-stackdriver) for your VM, [review serial port output](/compute/docs/troubleshooting/viewing-serial-port-output#viewing_serial_port_output) for a soft lockup stack trace.\n- Review your VM's operating system logs (`/var/log/messages`) for a soft lockup stack trace.\n\n**Example soft lockup stack trace** \n\n```\nwatchdog: BUG: soft lockup - CPU#3 stuck for 22s!\n```\n\nTo detect future soft lockups, you can do the following:\n\n1. [Enable serial port output logging](/compute/docs/troubleshooting/viewing-serial-port-output#enable-stackdriver).\n\n2. [Create a log-based alerting policy](/logging/docs/alerting/log-based-alerts#lba-definition)\n for the following log:\n\n ```\n resource.type=\"gce_instance\" log_id(\"serialconsole.googleapis.com/serial_port_1_output\") textPayload=~\"watchdog.*lockup\"\n ```\n | **Note:** When you test the query, it is likely that no logs appear. This is expected behavior.\n\nTroubleshoot soft lockups\n-------------------------\n\nAfter you've identified that a soft lockup is occurring, try the following\ntroubleshooting steps to resolve the issue:\n\n1. Check your OS vendor's site for known errors with your OS version. Sometimes you might find reference to specific kernel modules in the stack trace that suggests a particular function or operation that is involved.\n2. Identify whether the soft lockup repeats with any frequency, such as coinciding with high load or certain activities. If the soft lockups correlate with high load, you might need to reconfigure your workload, for example by using a larger VM or splitting the load across more VMs.\n3. Check if the soft lockups correlate with any changes to your runtime environment such as new software deployments or OS image updates.\n4. Evaluate whether any [maintenance events](/compute/docs/instances/host-maintenance-overview#maintenanceevents) have taken place around the time of the soft lockup, by reviewing [audit logs](/compute/docs/logging/audit-logging#viewing_logs) for system event audit logs.\n\nIf the proceeding troubleshooting steps didn't resolve the issue,\n[file a support case](/support/docs/customer-care-procedures#create_a_support_case)\nand include all of the information you gathered from troubleshooting.\n\nBest practices to avoid soft lockups\n------------------------------------\n\nTo help prevent your VMs from experiencing soft lockups, we recommend\nimplementing the following best practices:\n\n- Ensure that you have appropriate redundant components configured for your system, such as high availability clusters, to provide a failover capability if a particular VM experiences a prolonged soft lockup. For more information, see [Designing resilient systems](/compute/docs/tutorials/robustsystems).\n- For compute-intensive workloads, consider using [compute-optimized machine families](/compute/docs/compute-optimized-machines).\n- Test your workload with simulated [maintenance events](/compute/docs/instances/host-maintenance-overview#maintenanceevents) to learn how your workload performs during live migration (if enabled), particularly under load testing.\n- If you're running a custom Linux Kernel or custom modules in your VM, test new changes under load before deploying them to your production environment. Confirm that your custom changes don't disqualify you from receiving support from your OS vendor.\n- Keep your operating system up to date. For more information, see [Operating system details](/compute/docs/images/os-details)."]]