Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini menunjukkan cara menyelesaikan masalah pada server Kubernetes API (kube-apiserver) untuk Google Distributed Cloud.
Halaman ini ditujukan untuk administrator dan Operator IT yang mengelola siklus proses infrastruktur teknologi yang mendasarinya, dan merespons pemberitahuan dan halaman saat tujuan tingkat layanan (SLO) tidak terpenuhi atau aplikasi gagal. Untuk mempelajari lebih lanjut peran umum dan contoh tugas yang kami referensikan dalam konten, lihat Peran dan tugas pengguna GKE umum. Google Cloud
Waktu tunggu webhook dan panggilan webhook yang gagal
Error ini dapat terlihat dalam beberapa cara yang berbeda. Jika Anda mengalami salah satu gejala berikut, panggilan webhook mungkin gagal:
Connection refused: Jika kube-apiserver melaporkan error waktu tunggu untuk
memanggil webhook, error berikut akan dilaporkan dalam log:
failed calling webhook "server.system.private.gdc.goog":
failed to call webhook: Post "https://root-admin-webhook.gpc-system.svc:443/mutate-system-private-gdc-goog-v1alpha1-server?timeout=10s":
dial tcp 10.202.1.18:443: connect: connection refused
Context deadline exceeded: Anda mungkin juga melihat error berikut yang dilaporkan dalam log:
failed calling webhook "namespaces.hnc.x-k8s.io": failed to call webhook: Post
"https://hnc-webhook-service.hnc-system.svc:443/validate-v1-namespace?timeout=10s\":
context deadline exceeded"
Jika Anda merasa mengalami waktu tunggu webhook atau panggilan webhook gagal,
gunakan salah satu metode berikut untuk mengonfirmasi masalah tersebut:
Periksa log server API untuk melihat apakah ada masalah jaringan.
Periksa log untuk mengetahui error terkait jaringan seperti TLS handshake error.
Periksa apakah IP/Port cocok dengan yang dikonfigurasi untuk direspons oleh server API.
Pantau latensi webhook dengan langkah-langkah berikut:
Jika webhook memerlukan lebih banyak waktu untuk diselesaikan, Anda dapat
mengonfigurasi nilai waktu tunggu kustom.
Latensi webhook menambah latensi permintaan API, sehingga harus dievaluasi secepat mungkin.
Jika error webhook memblokir ketersediaan cluster atau webhook tidak berbahaya
untuk dihapus dan mengurangi situasi, periksa apakah failurePolicy dapat disetel sementara
ke Ignore atau hapus webhook yang bermasalah.
Kegagalan atau latensi panggilan server API
Error ini mungkin terlihat dengan beberapa cara:
Error resolusi nama eksternal: Klien eksternal dapat menampilkan error yang berisi lookup dalam pesan, seperti:
dial tcp: lookup kubernetes.example.com on 127.0.0.1:53: no such host
Error ini tidak berlaku untuk klien yang berjalan dalam cluster. IP Layanan Kubernetes disuntikkan, sehingga tidak diperlukan resolusi.
Error jaringan: Klien mungkin mencetak error jaringan umum saat mencoba
melakukan panggilan ke server API, seperti contoh berikut:
dial tcp 10.96.0.1:443: connect: no route to host
dial tcp 10.96.0.1:443: connect: connection refused
dial tcp 10.96.0.1:443: connect: i/o timeout
Latensi tinggi saat terhubung ke server API: Koneksi ke server API mungkin berhasil, tetapi permintaan kehabisan waktu di sisi klien. Dalam skenario ini,
klien biasanya mencetak pesan error yang berisi context deadline
exceeded.
Jika koneksi ke server API gagal sepenuhnya, coba koneksi dalam
lingkungan yang sama tempat klien melaporkan error.
Container ephemeral Kubernetes
dapat digunakan untuk menyisipkan container pen-debug ke namespace yang ada sebagai
berikut:
Dari tempat klien yang bermasalah berjalan, gunakan kubectl untuk melakukan permintaan
dengan verbositas tinggi. Misalnya, permintaan GET ke /healthz biasanya tidak memerlukan autentikasi:
kubectlget-v999--raw/healthz
Jika permintaan gagal atau kubectl tidak tersedia, Anda dapat memperoleh URL dari
output dan melakukan permintaan secara manual dengan curl. Misalnya, jika
host layanan yang diperoleh dari output sebelumnya adalah https://192.0.2.1:36917/,
Anda dapat mengirim permintaan serupa sebagai berikut:
# Replace "--ca-cert /path/to/ca.pem" to "--insecure" if you are accessing# a local cluster and you trust the connection cannot be tampered.# The output is always "ok" and thus contains no sensentive information.
curl-v--cacert/path/to/ca.pemhttps://192.0.2.1:36917/healthz
Output dari perintah ini biasanya menunjukkan penyebab utama kegagalan koneksi.
Jika koneksi berhasil tetapi lambat atau waktunya habis, hal ini menunjukkan
server API kelebihan muatan. Untuk mengonfirmasi, di konsol, lihat metrik latensi permintaan dan API
Server Request Rate di Cloud Kubernetes > Anthos >
Cluster > K8s Control Plane.
Untuk mengatasi kegagalan koneksi atau masalah latensi ini, tinjau opsi perbaikan berikut:
Jika terjadi error jaringan dalam cluster, mungkin ada masalah pada plugin Container Network Interface (CNI). Masalah ini biasanya bersifat sementara
dan akan teratasi dengan sendirinya setelah Pod dibuat ulang atau dijadwalkan ulang.
Jika error jaringan berasal dari luar cluster, periksa apakah klien dikonfigurasi dengan benar untuk mengakses cluster, atau buat ulang konfigurasi klien. Jika koneksi melewati proxy atau gateway, periksa apakah koneksi lain yang melewati mekanisme yang sama berfungsi.
Jika server API kelebihan beban, biasanya berarti banyak klien mengakses server API secara bersamaan. Satu klien tidak dapat membebani server API secara berlebihan karena pembatasan dan fitur Prioritas dan Keadilan. Tinjau workload untuk area berikut:
Berfungsi di tingkat Pod. Lebih umum terjadi kesalahan membuat dan melupakan Pod
daripada resource tingkat yang lebih tinggi.
Menyesuaikan jumlah replika melalui perhitungan yang salah.
Webhook yang mengembalikan permintaan ke dirinya sendiri atau memperbesar beban dengan
membuat lebih banyak permintaan daripada yang ditanganinya.
Langkah berikutnya
Jika Anda memerlukan bantuan tambahan, hubungi
Cloud Customer Care.
Anda juga dapat melihat bagian
Mendapatkan dukungan untuk mengetahui informasi selengkapnya tentang sumber dukungan, termasuk yang berikut:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-01 UTC."],[],[],null,["This page shows you how to resolve issues with the Kubernetes API server\n(`kube-apiserver`) for Google Distributed Cloud.\n\nThis page is for IT administrators and Operators who manage the\nlifecycle of the underlying tech infrastructure, and respond to alerts and pages\nwhen service level objectives (SLOs) aren't met or applications fail. To learn\nmore about common roles and example tasks that we reference in Google Cloud\ncontent, see\n[Common GKE user roles and tasks](/kubernetes-engine/enterprise/docs/concepts/roles-tasks).\n\nWebhook timeouts and failed webhook calls\n\nThese errors might be seen in a few different ways. If you experience any of the\nfollowing symptoms, it's possible that webhook calls are failing:\n\n- **Connection refused:** If `kube-apiserver` reports timeout errors for\n calling the webhook, the following error is reported in the logs:\n\n failed calling webhook \"server.system.private.gdc.goog\":\n failed to call webhook: Post \"https://root-admin-webhook.gpc-system.svc:443/mutate-system-private-gdc-goog-v1alpha1-server?timeout=10s\":\n dial tcp 10.202.1.18:443: connect: connection refused\n\n- **Context deadline exceeded:** You might also see the following error reported\n in the logs:\n\n failed calling webhook \"namespaces.hnc.x-k8s.io\": failed to call webhook: Post\n \"https://hnc-webhook-service.hnc-system.svc:443/validate-v1-namespace?timeout=10s\\\":\n context deadline exceeded\"\n\nIf you think that you are experiencing webhook timeouts or failed webhook calls,\nuse one of the following methods to confirm the issue:\n\n- Check the API server log to see if there is network issue.\n\n - Check the log for network-related errors like `TLS handshake error`.\n - Check if the IP/Port matches what the API server is configured to respond on.\n- Monitor webhook latency with the following steps:\n\n 1. In the console, go to the Cloud Monitoring page.\n\n [Go to the Cloud Monitoring page](https://console.cloud.google.com/monitoring/)\n 2. Select **Metrics explorer**.\n\n 3. Select the `apiserver_admission_webhook_admission_duration_seconds` metric.\n\nTo resolve this issue, review the following suggestions:\n\n- Additional firewall rules might be required for the webhook. For more\n information, see how to\n [add firewall rules for specific use cases](/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).\n\n- If the webhook requires more time to complete, you can\n [configure a custom timeout value](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#timeouts).\n The webhooks latency adds to API request latency, so should be evaluated as\n quickly as possible.\n\n- If the webhook error blocks cluster availability or the webhook is harmless\n to remove and mitigates the situation, check if it's possible to temporarily\n set the `failurePolicy` to `Ignore` or remove the offending webhook.\n\nAPI server dial failure or latency\n\nThis error might be seen in a few different ways:\n\n- **External name resolution errors:** An external client might return errors\n that contain `lookup` in the message, such as:\n\n dial tcp: lookup kubernetes.example.com on 127.0.0.1:53: no such host\n\n This error doesn't apply to a client running within the cluster. The\n Kubernetes Service IP is injected, so no resolution is required.\n- **Network errors:** The client might print a generic network error when trying\n to dial the API server, like the following examples:\n\n dial tcp 10.96.0.1:443: connect: no route to host\n dial tcp 10.96.0.1:443: connect: connection refused\n dial tcp 10.96.0.1:443: connect: i/o timeout\n\n- **High latency connecting to API server:** The connection to API server might\n be successful, but the requests timeout on the client side. In this scenario,\n the client usually prints error messages containing `context deadline\n exceeded`.\n\nIf the connection to the API server fails completely, try the connection within\nthe same environment where the client reports the error.\n[Kubernetes ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/)\ncan be used to inject a debugging container to the existing namespaces as\nfollows:\n\n1. From where the problematic client runs, use `kubectl` to perform a request\n with high verbosity. For example, a `GET` request to `/healthz` usually\n requires no authentication:\n\n kubectl get -v999 --raw /healthz\n\n2. If the request fails or `kubectl` is unavailable, you can obtain the URL from\n the output and manually perform the request with `curl`. For example, if the\n service host obtained from the previous output was `https://192.0.2.1:36917/`,\n you can send a similar request as follows:\n\n # Replace \"--ca-cert /path/to/ca.pem\" to \"--insecure\" if you are accessing\n # a local cluster and you trust the connection cannot be tampered.\n # The output is always \"ok\" and thus contains no sensentive information.\n\n curl -v --cacert /path/to/ca.pem https://192.0.2.1:36917/healthz\n\n The output from this command usually indicates the root cause of a failed\n connection.\n | **Note:** You can't use the `ping` or `traceroute` commands to the IP address. A Kubernetes Service IP doesn't accept ICMP or protocols outside the list defined in the Service resource.\n\n If the connection is successful but is slow or times out, it indicates an\n overloaded API server. To confirm, in the console look at `API\n Server Request Rate` and request latency metrics in `Cloud Kubernetes \u003e Anthos \u003e\n Cluster \u003e K8s Control Plane`.\n\nTo resolve these connection failures or latency problems, review the following\nremediation options:\n\n- If a network error occurs within the cluster, there might be problem with the\n Container Network Interface (CNI) plugin. This problem is usually transient\n and resolves itself after a Pod recreation or reschedule.\n\n- If the network error is from outside the cluster, check if the client is\n properly configured to access the cluster, or generate the client\n configuration again. If the connection goes through a proxy or gateway, check\n if another connection that goes through the same mechanism works.\n\n- If the API server is overloaded, it usually means that many clients access the\n API server at the same time. A single client can't overload an API server due\n to throttling and the\n [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/)\n feature. Review the workload for the following areas:\n\n - Works at Pod level. It's more common to mistakenly create and forget Pods than higher level resources.\n - Adjust the number of replicas through erroneous calculation.\n - A webhook that loops back the request to itself or amplifies the load by creating more requests than it handles.\n\nWhat's next\n\nIf you need additional assistance, reach out to\n\n[Cloud Customer Care](/support-hub).\nYou can also see\n[Getting support](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support) for more information about support resources, including the following:\n\n- [Requirements](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#intro-support) for opening a support case.\n- [Tools](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#support-tools) to help you troubleshoot, such as your environment configuration, logs, and metrics.\n- Supported [components](/kubernetes-engine/distributed-cloud/bare-metal/docs/getting-support#what-we-support)."]]