Pods unable to initialize cause metadata server issue

Problem

You notice connectivity issues appear while connecting to the metadata server because pods are not starting up. In addition, you get connection errors to the metadata server from the container logs as described below:

Authentication failed using Compute Engine authentication due to unavailable metadata server

Compute Engine Metadata server unavailable on attempt 3 of 3. Reason:[Errno 99] Cannot assign requested address

Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out

Environment

  • Google Kubernetes Engine
  • Workload Identity enabled

Solution

  1. Increase GCE_METADATA_TIMEOUT environment variable to a larger value in the YAML file. Currently, the default value is 3 seconds. For example, the following is the helloworld deployment with GCE_METADATA_TIMEOUT=60:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: helloworld-deployment
      labels:
        app: hello
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: helloworld
      template:
        metadata:
          labels:
            app: helloworld
        spec:
          containers:
          - name: helloworld
            image: myrepo:helloworld:latest
            ports:
            - containerPort: 80
            env: 
            - name: GCE_METADATA_TIMEOUT
              value: "60"
    
  2. Check whether there are too many requests running concurrently and stuck in CLOSE_WAIT. If so, review the application code to ensure they close the connection to the metadata server properly. To check the CLOSE_WAIT connections, you can log in to the Pod and run the netstat command.
  3. Check the STATE of the connection to the metadata server, IP address 169.254.169.254, as an example below:
    $ netstat -nap | grep 169.254.169.254
    

Cause

The requests timeout to the metadata server is too aggressive or too many concurrent application connections to metadata server stuck in the CLOSE_WAIT status.