Known Issues in Knative serving

This page lists known issues for Knative serving. For known security vulnerabilities, see Security best practices.

You can also check for existing issues or open new issues in the public issue trackers.

Also see the troubleshooting page for troubleshooting strategies as well as solutions for some common errors.

Services stuck in RevisionMissing due to missing MutatingWebhookConfiguration

Creation of a new service or a new service revision may become stuck in the state "RevisionMissing" due to a missing webhook configuration. You can confirm this using the command

kubectl get mutatingwebhookconfiguration webhook.serving.knative.dev

which returns

kmutatingwebhookconfigurations.admissionregistration.k8s.io "webhook.serving.knative.dev" not found`

Temporary workaround

Until this is fixed in an upcoming version, you can do the following to fix this issue:

  1. Restart the webhook Pod to recreate the MutatingWebhookConfiguration:

    kubectl delete pod -n knative-serving -lapp=webhook
    kubectl get mutatingwebhookconfiguration --watch
  2. Restart the controllers:

    kubectl delete pod -n gke-system -listio=pilot
    kubectl delete pod -n knative-serving -lapp=controller
  3. Deploy a new revision for each service that has the RevisionMissing issue:

    gcloud run services update SERVICE --update-labels client.knative.dev/nonce=""

    replacing SERVICE with the name of the service.

  4. Repeat the above steps as needed if you experience the same issue when you deploy new revisions of the service.

Zonal clusters

When using a zonal cluster with Knative serving, access to the control plane is unavailable during cluster maintenance.

During this period, Knative serving may not work as expected. Services deployed in that cluster

  • Are not shown in the Cloud console or via gcloud CLI
  • Cannot be deleted or updated
  • Will not automatically scale instances, but existing instances will continue to serve new requests

To avoid these issues, you can use a regional cluster, which ensures a high availability control plane.

Default memory limit is not enforced through command line

If you use the command line to deploy your services, you must include the --memory flag to set a memory limit for that service. Excluding the --memory flag allows a service to consume up to the total amount of available memory on the node where that pod is running, which might have unexpected side effects.

When deploying through the Google Cloud console, the default value of 256M is used unless a different value is specified.

To avoid having to define default limits for each service, you can choose to define a default memory limit for the namespace where you deploy those services. For more information, see Configuring default memory limits in the Kubernetes documentation.

Default CPU limit is not enabled

When deploying using the command line or Console, the amount of CPU a service can use is not defined. This allows the service to consume all available CPU in the node where it is running, which may have unexpected side effects.

You can workaround this by defining a default CPU limit for the namespace where you are deploying services with Knative serving. For more information see Configuring default CPU limits in the Kubernetes documentation.

Note: By default, services deployed with Knative serving request 400m CPU, which is used to schedule instances of a service on the cluster nodes.

Deploying private container images in Artifact Registry

There is a known deployment issue that is caused by an authentication failure between Knative serving and Artifact Registry when private container images are deployed. To avoid issues when deploying private images in Artifact Registry you can either:

Configuration errors on clusters upgraded to version 0.20.0-gke.6

Clusters that are upgraded to version 0.20.0-gke.6 can receive one of the following errors.

When updating that cluster's configmap, the cluster can receive the following error:

Error from server (InternalError): error when replacing "/tmp/file.yaml":
Internal error occurred: failed calling webhook "config.webhook.istio.networking.internal.knative.dev":
the server rejected our request for an unknown reason

If the pods fail to start because of a queue proxy failure, the cluster can receive the following error:

Startup probe failed: flag provided but not defined: -probe-timeout

To resolve these errors, you must run the following command to remove the validatingwebhookconfiguration configuration that is no longer supported in 0.20.0:

kubectl delete validatingwebhookconfiguration config.webhook.istio.networking.internal.knative.dev

After removing the unsupported configuration, you can proceed with updating your cluster's configmap.

Missing metrics after upgrading to Knative serving 0.23.0-gke.9

Issue: The following metrics are missing after upgrading your cluster version to 0.23.0-gke.9: Request count, Request latencies and Container instance count

Possible cause: The Metric Collector is disabled.

To determine if the Metric Collector is preventing your metrics from being collected:

  1. Ensure that your version of Knative serving is 0.23.0-gke.9 by runnging the following command:

    kubectl get deployment controller -n knative-serving -o jsonpath='{.metadata.labels.serving\.knative\.dev/release}'
    
  2. Check if Metric Collector is disable by running the following command:

    kubectl get cloudrun cloud-run -n cloud-run-system -o jsonpath='{.spec.metricscollector}'
    

    Your Metric Collector is disabled if the result is not {enabled: true}.

  3. To enable Metric Collector, run one of the following commands:

    • If the result is empty, run:

      kubectl patch cloudrun cloud-run -n cloud-run-system --type='json' -p='[{"op": "test", "path": "/spec", "value": NULL}, {"op": "add", "path": "/spec", "value": {}}]'
      kubectl patch cloudrun cloud-run -n cloud-run-system --type='json' -p='[{"op": "test", "path": "/spec/metricscollector", "value": NULL}, {"op": "add", "path": "/spec/metricscollector", "value": {}}]'
      kubectl patch cloudrun cloud-run -n cloud-run-system --type='json' -p='[{"op": "add", "path": "/spec/metricscollector/enabled", "value": true}]'
      
    • If the result is {enabled: false}, run:

      kubectl patch cloudrun cloud-run -n cloud-run-system --type='json' -p='[{"op": "replace", "path": "/spec/metricscollector/enabled", "value": true}]'
      
  4. Verify that Metric Collector is enabled by running the following command:

    kubectl get cloudrun cloud-run -n cloud-run-system -o jsonpath='{.spec.metricscollector}'