Creating a cluster using Windows Server node pools

In this article, you learn how to create a Google Kubernetes Engine (GKE) cluster with node pools running Microsoft Windows Server. With this cluster, you can use Windows Server containers. Hyper-V containers are not currently supported. Similar to Linux containers, Windows Server containers provide process and namespace isolation.

To run on GKE, Windows Server container images need to be built on Windows Server version 2019 (LTSC) or Windows Server version 1909 (SAC). See Windows Server servicing channels: LTSC and SAC for more information.

Windows Server nodes require more resources than a typical Linux node. The nodes need the extra resources to account for running the Windows OS and for the Windows Server components that can't run in containers. Since Windows Server nodes require more resources, your allocatable resources are lower than they would be with Linux nodes.

Limitations

There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.

In addition, some GKE features aren't supported.

For clusters, the following features aren't supported with Windows Server node pools:

For Windows Server node pools, the following features aren't supported:

Version mapping

The following table shows how GKE versions map to Windows Server versions.

GKE version Windows SAC version Windows LTSC version
1.14.x 10.0.17763 (Windows Server version 1809) N/A
1.15.x 10.0.17763 (Windows Server version 1809) N/A
1.16.4-gke.25 10.0.18363 (Windows Server version 1909) 10.0.17763 (Windows Server 2019)

Create a cluster with Windows Server nodes

In this section, you create a cluster that uses a Windows Server container.

To run Windows Server containers, your cluster must have at least one Windows and one Linux node pool. You cannot create a cluster using only a Windows Server node pool.

The Linux node pool is required to run critical cluster add-ons, Pods, and to support features such as kubectl exec and kubectl logs. Because of its importance, do not resize your Linux node pool down to zero and ensure it has sufficient capacity.

Step 1: Update and configure gcloud

Update and configure the gcloud command-line tool with your project ID and zone.

gcloud

Update your gcloud to make sure it's up-to-date:

gcloud components update --quiet

Navigate to your Google Cloud project:

gcloud config set project [PROJECT-ID]

Set the default zone, so you don't have to use the --zone flag in the following gcloud commands:

gcloud config set compute/zone [ZONE]

Step 2: Create a cluster

Clusters using Windows Server node pools don't support all Kubernetes and GKE features. See the limitations section for more information.

gcloud

Create a GKE cluster with the following fields:

gcloud beta container clusters create [CLUSTER_NAME] \
  --enable-ip-alias \
  --num-nodes=2 \
  --release-channel=rapid

Where:

  • [CLUSTER_NAME] is the name you choose for your cluster.
  • --enable-ip-alias turns on alias IP. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  • --num-nodes specifies the number of Linux nodes you create. You should provide sufficient compute resources to run cluster add-ons. This is an optional field and if you omit it, your cluster is created with 3 Linux nodes.
  • --release-channel=rapid is required as this feature is currently only available in the rapid release channel.

Step 3: Create a Windows Server node pool

Windows Server node pools don't support all Kubernetes and GKE features. See the limitations section for more information.

gcloud

gcloud container node-pools create [NODE_POOL_NAME] \
  --cluster=[CLUSTER_NAME] \
  --image-type=[IMAGE_TYPE] \
  --enable-autoupgrade \
  --machine-type=n1-standard-2

Where:

  • [NODE_POOL_NAME] is the name you choose for your Windows Server node pool.
  • [CLUSTER_NAME] is the name of the cluster you created in step 2.
  • [IMAGE_TYPE] is either WINDOWS_SAC or WINDOWS_LTSC. To run on GKE, Windows Server container images need to be built on Windows Server version 2019 (LTSC) or Windows Server version 1909 (SAC). If you use WINDOWS_LTSC, LTSC is your image type. If you use WINDOWS_SAC, SAC is your image type.
  • --enable-autoupgrade enables node autoupgrade. To use this, you must build your Windows Server container images using multi-arch. This avoids any version mismatch issues between the node OS version and the base container image. If you aren't using multi-arch images, use --no-enable-autoupgrade instead. To learn more about container compatibility, see Windows container version compatibility.
  • --machine-type= n1-standard-2 is the minimum recommended size as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported.

After you create a Windows Server node pool, the cluster goes into RECONCILE state for several minutes as the master is updated.

Step 4: Get kubectl credentials

Use the get-credentials command to enable kubectl to work with the cluster you created.

gcloud

gcloud container clusters get-credentials [CLUSTER_NAME]

For more information on the get-credentials command, see the SDK get-credentials documentation.

Step 5: Wait for cluster initialization

Before using the cluster, wait for several seconds until windows.config.common-webhooks.networking.gke.io is created. This webhook adds scheduling tolerations to Pods created with the kubernetes.io/os: windows (or beta.kubernetes.io/os: windows) node selector to ensure they are allowed to run on Windows Server nodes. It also validates the Pod to ensure that it only uses features supported on Windows.

To ensure the webhook is created, run the following command:

kubectl get mutatingwebhookconfigurations

The output should show the webhook running:

NAME                                              CREATED AT
windows.config.common-webhooks.networking.gke.io  2019-12-12T16:55:47Z

Now you have a cluster with two node pools (one Linux and one Windows) you can deploy a Windows application.

View and query logs using Stackdriver

Logging is enabled automatically in GKE clusters. You can view the logs of the containers and the logs from other services on the Windows Server nodes on Stackdriver. See the Stackdriver Logging page for query examples and substitute the resource types based on the migration guide.

Following is an example of a filter to get the container log:

resource.type="k8s_container"
resource.labels.cluster_name="your_cluster_name"
resource.labels.namespace_name="your_namespace_id"
resource.labels.container_name="your_container_name"
resource.labels.Pod_name="your_Pod_name"

Accessing a Windows Server node using Remote Desktop Protocol (RDP)

You can connect to a Windows Server node in your cluster using RDP. For instructions on how to connect, see Connecting to Windows instances.

Delete Windows Server node pools

Delete a Windows Server node pool by using the following command:

gcloud

gcloud container node-pools delete --cluster=[CLUSTER_NAME] [NODE_POOL_NAME]

After deleting all Windows Server node pools, wait until windows.config.common-webhooks.networking.gke.io is deleted. Confirm it's been deleted by using the following command:

kubectl get mutatingwebhookconfigurations

You should not see windows.config.common-webhooks.networking.gke.io in the output.

Troubleshooting

See the Kubernetes documentation for general guidance on debugging Pods and services.

Building multi-arch images

If you are having difficulty creating multi-arch images, read the following.

Docker supports multi-arch (or multi-platform) images. With a multi-arch image, the container runtime on the node can pull a compatible image for the platform. Building a multi-arch image manifest involves building the image for each platform, then building the manifest that references those images for each platform.

To build a multi-arch image manifest:

  1. Create a LTSC 2019 Docker image. For example, gcr.io/my-project/foo:1.0-2019.
  2. Create a SAC 1909 Docker image. For example, gcr.io/my-project/foo:1.0-1909.
  3. Create a Windows Server version 1909 VM.
  4. Use RDP to connect to the VM.
  5. Run PowerShell.
  6. Enable the docker manifest experimental feature.
    PS C:>  $env:DOCKER_CLI_EXPERIMENTAL = 'enabled'
    
  7. Create the multi-arch-manifest.
    docker manifest create gcr.io/my-project/foo:1.0 gcr.io/my-project/foo:1.0-2019 gcr.io/my-project/foo:1.0-1909
    
  8. Push the newly create manifest.
    docker manifest push gcr.io/my-project/foo:1.0 gcr.io/my-project/foo:1.0-2019 gcr.io/my-project/foo:1.0-1909
    

Using Group Managed Service Accounts (gMSA)

If you have difficulty using gMSA, it might be because of the computer name limit in Active Directory. Active Directory limits computer names to 15 characters. When you are using GKE, this name is derived from the GKE node name. Nodes names come from the name of the node pool, followed by a unique string. When the node's name exceeds 15 characters it's truncated back to 15 characters.

For example, if a node is named gke-cluster-windows--node-pool-window-3d3afc34- wnnn it will be truncated to GKE-CLUSTER-WIN.

If another node is named gke-cluster-windows--node-pool-window-123gtj12-aabb it will also be truncated to GKE-CLUSTER-WIN.

Active Directory has a one-to-one relationship between computers and names, so if two computers share a name an error will occur.

To avoid this issue, rename the computer when joining Active Directory. After you have renamed the computer, you also need to update the node's kublet and kube-proxy config to prevent an issue that stops the node being able to connect to the cluster. You can do this by appending the --hostname-override flag to the kubelet and kube-proxy services path. Set the flag to your node's instance name and restart the services.

To update the config, run the following command:

sc.exe config [SERVICE_NAME] binPath="[EXISTING_SERVICE_BINPATH] --hostname-override=[NODE_INSTANCE_NAME]"

Where:

  • [SERVICE_NAME] is kubelet or kube-proxy. Run this script once per service.
  • [EXISTING_SERVICE_BINPATH] is the binPath of the service. You may retrieve this using sc.exe qc [SERVICE_NAME].
  • [NODE_INSTANCE_NAME] is the instance name of the node VM.

This resolves the issue and allows the node and host Pods to come up. It also avoids name conflicts in Active Directory.

Network connectivity

If you experience network connectivity problems from your Windows Server containers, it might be because Windows Server container networking often assumes a network MTU of 1500, which is incompatible with Google Cloud's MTU of 1460.

Check that the MTU of the network interface in the container as well as the network interfaces of the Windows Server node itself are all 1460 or less. For information on how to set the MTU, see known issues for Windows containers.

TerminationGracePeriod

The Windows system timeout for the container might differ from the grace period you configure. This difference can cause Windows to force-terminate the container before the end of the grace period passed to the runtime.

You can modify the Windows timeout by editing container-local registry keys at image-build time. If you modify the Windows timeout, you might also need to adjust TerminationGracePeriodSeconds to match.

Image pull errors

Windows Server container images, and the individual layers they are composed of, can be quite large. This can cause Kubelet to timeout and fail when downloading and extracting the container layers.

If you see the "Failed to pull image" or "Image pull context cancelled" error messages or an "ErrImagePull" status for your Pods, you can try the following options to successfully pull your Windows Server containers:

  • Break the application layers of the Windows Server container image into smaller layers that can each be pulled and extracted more quickly. This can make Docker's layer caching more effective and make image pull retries more likely to succeed. To learn more about layers, see the Docker article About images, containers, and storage drivers.

  • Connect to your Windows Server nodes and manually use the docker pull command on your container images before creating your Pods.

  • Set the image-pull-progress-deadline flag for the kubelet service to increase the timeout for pulling container images.

    You can do this by connecting to your Windows nodes and running the following PowerShell commands.

    1. Get the existing command line for the Kubelet service from the Windows registry.

      PS C:\> $regkey = "HKLM\SYSTEM\CurrentControlSet\Services\kubelet"
      
      PS C:\> $name = "ImagePath"
      
      PS C:\> $(reg query ${regkey} /v ${name} | Out-String) -match `
      "${name}.*(C:.*kubelet\.exe.*)\r"
      
      PS C:\> $kubelet_cmd = $Matches[1]
      
    2. Set a new command line for the Kubelet service, with an additional flag to increase the timeout.

      PS C:\> reg add ${regkey} /f /v ${name} /t REG_EXPAND_SZ /d "${kubelet_cmd} `
      --image-pull-progress-deadline=15m
      
    3. Confirm that the change was successful.

      PS C:\> reg query ${regkey} /v ${name}
      
    4. Restart the kubelet service so the new flag takes effect.

      PS C:\> Restart-Service kubelet
      
    5. Confirm that the kubelet service restarted successfully.

      PS C:\> Get-Service kubelet # ensure state is Running
      

What's next