Creating a cluster using Windows Server node pools

In this page, you learn how to create a Google Kubernetes Engine (GKE) cluster with node pools running Microsoft Windows Server. With this cluster, you can use Windows Server containers. Microsoft Hyper-V containers are not currently supported. Similar to Linux containers, Windows Server containers provide process and namespace isolation.

A Windows Server node requires more resources than a typical Linux node. Windows Server nodes need the extra resources to run the Windows OS and for the Windows Server components that cannot run in containers. Since Windows Server nodes require more resources, your allocatable resources are lower than they would be with Linux nodes.

Creating a cluster using Windows Server node pools

In this section, you create a cluster that uses a Windows Server container.

To create this cluster you need to complete the following tasks:

  1. Update and configure gcloud.
  2. Choose your node image.
  3. Create a cluster and node pools.
  4. Get kubectl credentials.
  5. Wait for cluster initialization.

Update and configure gcloud

Before you start, make sure you have performed the following tasks:

Set up default gcloud settings using one of the following methods:

  • Using gcloud init, if you want to be walked through setting defaults.
  • Using gcloud config, to individually set your project ID, zone, and region.

Using gcloud init

If you receive the error One of [--zone, --region] must be supplied: Please specify location, complete this section.

  1. Run gcloud init and follow the directions:

    gcloud init

    If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

    gcloud init --console-only
  2. Follow the instructions to authorize gcloud to use your Google Cloud account.
  3. Create a new configuration or select an existing one.
  4. Choose a Google Cloud project.
  5. Choose a default Compute Engine zone.

Using gcloud config

  • Set your default project ID:
    gcloud config set project project-id
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone compute-zone
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region compute-region
  • Update gcloud to the latest version:
    gcloud components update

Choose your Windows Server node image

To run on GKE, Windows Server container node images need to be built on Windows Server version 2019 (LTSC) or Windows Server version 1909 (SAC). A single cluster can have multiple Windows Server node pools using different Windows Server versions, but each individual node pool can only use one Windows Server version.

Consider the following when choosing your image type:

  • SAC versions are only supported by Microsoft for 18 months after their initial release. If you choose SAC for the image type for your node pool, but do not upgrade your node pool to newer GKE versions that target newer SAC versions, you cannot create new nodes in your node pool when the support lifecycle for the SAC version ends.

  • Only choose SAC if you can upgrade your node pool and the containers running in it regularly. GKE periodically updates the SAC version used for Windows node pools in new GKE releases, so choosing SAC for your node pool image type requires you to rebuild your containers more often.

  • New Windows Server features are typically introduced into SAC versions first. Because of this, new GKE Windows functionality might be introduced in SAC node pools first.

If you are unsure of which Windows Server image type to use, we recommend choosing Windows Server LTSC in order to avoid version incompatibility problems when upgrading your node pool. For additional information, see Windows Server servicing channels: LTSC and SAC in Microsoft's documentation.

Version compatibility

Both Windows Server Core and Nano Server can be used as a base image for your containers.

Windows Server containers have important version compatibility requirements:

  • Windows Server containers built for LTSC do not run on SAC nodes, and vice-versa.
  • Windows Server containers built for a specific LTSC or SAC version do not run on other LTSC or SAC versions without being rebuilt to target the other version.

Building your Windows Server container images as multi-arch images that can target multiple Windows Server versions can help you manage this versioning complexity.

Version mapping

Microsoft releases new SAC versions approximately every six months and new LTSC versions every two to three years. These new versions are typically available in new GKE minor versions. Within a GKE minor version the LTSC and SAC versions usually remain fixed.

The following table shows how GKE versions map to Windows Server Core versions:

1.15

GKE version SAC version LTSC version
1.15.x (Early Access only) 10.0.17763 (Windows Server version 1809 Core) N/A

1.16

GKE version SAC version LTSC version
1.16.4-gke.25 - 1.16.7.x (Beta only) 10.0.18363.592 (Windows Server version 1909 Core) 10.0.17763.973 (Windows Server 2019 Core)
1.16.8-gke.8 - 1.16.15-gke.1300 10.0.18363.720 (Windows Server version 1909 Core) 10.0.17763.1098 (Windows Server 2019 Core)
1.16.15-gke.1400+ 10.0.18363.1082 (Windows Server version 1909 Core) 10.0.17763.1457 (Windows Server 2019 Core)

1.17

GKE version SAC version LTSC version
1.17.x - 1.17.4-gke.5 10.0.18363.592 (Windows Server version 1909 Core) 10.0.17763.973 (Windows Server 2019 Core)
1.17.4-gke.6 - 1.17.8-gke.2 10.0.18363.720 (Windows Server version 1909 Core) 10.0.17763.1098 (Windows Server 2019 Core)
1.17.8-gke.16 - 1.17.12-gke.1200 10.0.18363.900 (Windows Server version 1909 Core) 10.0.17763.1282 (Windows Server 2019 Core)
1.17.12-gke.1500+ 10.0.18363.1082 (Windows Server version 1909 Core) 10.0.17763.1457 (Windows Server 2019 Core)

1.18

GKE version SAC version LTSC version
1.18.x - 1.18.9-gke.1200 10.0.18363.900 (Windows Server version 1909 Core) 10.0.17763.1282 (Windows Server 2019 Core)
1.18.9-gke.1300+ 10.0.18363.1082 (Windows Server version 1909 Core) 10.0.17763.1457 (Windows Server 2019 Core)

Create a cluster and node pools

To run Windows Server containers, your cluster must have at least one Windows and one Linux node pool. You cannot create a cluster using only a Windows Server node pool. The Linux node pool is required to run critical cluster add- ons.

Because of its importance, do not resize your Linux node pool down to zero and ensure it has sufficient capacity to run cluster add-ons.

gcloud

Create a cluster with the following fields:

  gcloud container clusters create cluster-name \
    --enable-ip-alias \
    --num-nodes=number-of-nodes \
    --cluster-version=version-number

Where:

  • cluster-name is the name you choose for your cluster.
  • --enable-ip-alias turns on alias IP. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  • number-of-nodes specifies the number of Linux nodes you create. You should provide sufficient compute resources to run cluster add-ons. This is an optional field and if omitted, uses the default value of 3.
  • version-number must be 1.16.8-gke.9 or higher. You can also use the --release-channel flag to select a release channel with a default version of 1.16.8-gke.9 or higher.

Create the Windows Server node pool with the following fields.

  gcloud container node-pools create node-pool-name \
    --cluster=cluster-name \
    --image-type=image-name \
    --no-enable-autoupgrade \
    --machine-type=machine-type-name

Where:

  • node-pool-name is the name you choose for your Windows Server node pool.
  • cluster-name is the name of the cluster you created above.
  • image-name is WINDOWS_SAC or WINDOWS_LTSC. For more information about these node images, see the Choose your Windows node image section.
  • --no-enable-autoupgrade disables node autoupgrade. Review Upgrading Windows Server node pools before enabling.
  • machine-type-name defines the machine type. n1-standard-2 is the minimum recommended machine type as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

Console

  1. Visit the Google Kubernetes Engine menu in Cloud Console.

    Visit the Google Kubernetes Engine menu

  2. Click the Create cluster button.

  3. In the Cluster basics section, complete the following:

    1. Enter the Name for your cluster.
    2. For the Location type, select the desired region or zone for your cluster.
    3. Under Master Version, select a Static version of 1.16.8-gke.9 or higher.
      Or,
      Under Master Version, select a Release channel with a default version of 1.16.8-gke.9 or higher.
  4. From the navigation pane, under Node Pools, click default-pool to create your Linux node pool. When configuring this node pool, you should provide sufficient compute resources to run cluster add-ons. You must also have available resource quota for the nodes and their resources (such as firewall routes).

  5. At the top of the page, click Add node pool to create your Windows Server node pool.

  6. In the Node pool details section, complete the following:

    1. Enter a Name for the Node pool.
    2. Choose the Node version for your nodes.
    3. Enter the Number of nodes to create in the node pool.
  7. From the navigation pane, under Node Pools, click Nodes.

    1. From the Image type drop-down list, select Windows Server Semi-Annual Channel or Windows Server Long-Term Servicing Channel. For more information about these node images, see the Choose your Windows node image section.
    2. Choose the default Machine configuration to use for the instances. n1-standard-2 is the minimum recommended size as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.
  8. From the navigation pane, select the name of your Windows Server node pool. This returns you to the Node pool details page.

    1. Under Automation, deselect Enable node auto-upgrade. Review the Upgrading Windows Server node pools section before enabling.
  9. From the navigation pane, select Networking.

    1. Under Advanced networking options, ensure Enable VPC-native traffic routing (uses alias IP) is selected. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  10. Click Create.

Terraform

You can use the Google Terraform provider to create a GKE cluster with a Windows Server node pool.

Add this block to your Terraform configuration:

resource "google_container_cluster" "cluster" {
  project  = "project-id"
  name     = "cluster-name"
  location = "location"

  min_master_version = "version-number"

  # Enable Alias IPs to allow Windows Server networking.
  ip_allocation_policy {
    cluster_ipv4_cidr_block  = "/14"
    services_ipv4_cidr_block = "/20"
  }

  # Removes the implicit default node pool, recommended when using
  # google_container_node_pool.
  remove_default_node_pool = true
  initial_node_count       = 1
}

# Small Linux node pool to run some Linux-only Kubernetes Pods.
resource "google_container_node_pool" "linux_pool" {
  name     = "linux-pool"
  project  = google_container_cluster.cluster.project
  cluster  = google_container_cluster.cluster.name
  location = google_container_cluster.cluster.location

  node_config {
    image_type = "COS_CONTAINERD"
  }
}

# Node pool of Windows Server machines.
resource "google_container_node_pool" "windows_pool" {
  name     = "node-pool-name"
  project  = google_container_cluster.cluster.project
  cluster  = google_container_cluster.cluster.name
  location = google_container_cluster.cluster.location

  node_config {
    image_type   = "image-name"
    machine_type = "machine-type-name"
  }

  # The Linux node pool must be created before the Windows Server node pool.
  depends_on = [google_container_node_pool.linux_pool]
}

Replace the following:

  • project-id is the project ID in which the cluster is created.
  • cluster-name is the name of the GKE cluster.
  • location is the location (region or zone) in which the cluster is created.
  • version-number must be 1.16.8-gke.9 or higher.
  • node-pool-name is the name you choose for your Windows Server node pool.
  • image-name is WINDOWS_SAC or WINDOWS_LTSC. For more information about these node images, see the Choose your Windows node image section.
  • machine-type-name defines the machine type. n1-standard-2 is the minimum recommended machine type as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

After you create a Windows Server node pool, the cluster goes into a RECONCILE state for several minutes as the control plane (master) is updated.

Get kubectl credentials

Use the get-credentials command to enable kubectl to work with the cluster you created.

gcloud container clusters get-credentials 

For more information on the get-credentials command, see the SDK get-credentials documentation.

Wait for cluster initialization

Before using the cluster, wait for several seconds until windows.config.common-webhooks.networking.gke.io is created. This webhook adds scheduling tolerations to Pods created with the kubernetes.io/os: windows node selector to ensure they are allowed to run on Windows Server nodes. It also validates the Pod to ensure that it only uses features supported on Windows.

To ensure the webhook is created, run the following command:

kubectl get mutatingwebhookconfigurations

The output should show the webhook running:

NAME                                              CREATED AT
windows.config.common-webhooks.networking.gke.io  2019-12-12T16:55:47Z

Now you have a cluster with two node pools (one Linux and one Windows) you can deploy a Windows application.

Upgrading Windows Server node pools

The Windows Server container version compatibility requirements mean that your container images might need to be rebuilt to match the Windows Server version for a new GKE version before upgrading your node pools.

To ensure that your container images remain compatible with your nodes, we recommend that you check the version mapping table and build your Windows Server container images as multi-arch images that can target multiple Windows Server versions. You can then update your container deployments to target the multi-arch images that will work on both the current and the next GKE version before manually invoking a GKE node pool upgrade. Manual node pool upgrades must be performed regularly because nodes cannot be more than two minor versions behind the control plane version.

We recommend enabling node auto-upgrades only if you continuously build multi-arch Windows Server container images that target the latest Windows Server versions, especially if you are using Windows Server SAC as the node image type. Node auto-upgrades are less likely to cause problems with the Windows Server LTSC node image type but there is still a risk of encountering version incompatibility issues.

Windows Updates

Windows Updates are disabled for Windows Server nodes. Automatic updates can cause node restarts at unpredictable times, and any Windows Updates installed after a node starts would be lost when the node is recreated by GKE. GKE makes Windows Updates available by periodically updating the Windows Server node images used in new GKE releases. There can be a delay between when Windows Updates are released by Microsoft and when they are available in GKE. When critical security updates are released, GKE updates the Windows Server node images as quickly as possible.

Viewing and querying logs

Logging is enabled automatically in GKE clusters. You can view the logs of the containers and the logs from other services on the Windows Server nodes using Kubernetes Engine monitoring.

The following is an example of a filter to get the container log:

resource.type="k8s_container"
resource.labels.cluster_name="your_cluster_name"
resource.labels.namespace_name="your_namespace_id"
resource.labels.container_name="your_container_name"
resource.labels.Pod_name="your_Pod_name"

Accessing a Windows Server node using Remote Desktop Protocol (RDP)

You can connect to a Windows Server node in your cluster using RDP. For instructions on how to connect, see Connecting to Windows instances in the Compute Engine documentation.

Building multi-arch images

You can build the multi-arch images manually or use a Cloud Build builder. For instructions, see Building Windows multi-arch images.

Deleting Windows Server node pools

Delete a Windows Server node pool by using gcloud or the Google Cloud Console.

gcloud

gcloud container node-pools delete --cluster=cluster-name node-pool-name

Console

To delete a Windows Server node pool using the Cloud Console, perform the following steps:

  1. Visit the Google Kubernetes Engine menu in Cloud Console.

    Visit the Google Kubernetes Engine menu

  2. Click the Edit icon (it looks like a pencil) next to the cluster that contains the node pool you want to delete.

  3. Under the Node Pools section, click the Delete icon (it looks like a trash can) next to the node pool you want to delete.

  4. When prompted to confirm, click Delete again.

Limitations

There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.

In addition, some GKE features are not supported.

For clusters, the following features are not supported with Windows Server node pools:

For Windows Server node pools, the following features are not supported:

Troubleshooting

See the Kubernetes documentation for general guidance on debugging Pods and Services.

Windows Pods fail to start

A version mismatch between the Windows Server container and the Windows node that is trying to run the container can result in your Windows Pods failing to start.

If the version for your Windows node pool is 1.16.8-gke.8 or greater, review Microsoft's documentation for the February 2020 Windows Server container incompatibility issue and build your container images with base Windows images that include Windows Updates from March 2020. Container images built on earlier base Windows images might fail to run on these Windows nodes and can also cause the node to fail with status NotReady.

Image pull errors

Windows Server container images, and the individual layers they are composed of, can be quite large. Their size can cause Kubelet to timeout and fail when downloading and extracting the container layers.

You might have encountered this problem if you see the "Failed to pull image" or "Image pull context cancelled" error messages or an ErrImagePull status for your Pods.

Try the following options to successfully pull your Windows Server containers:

  • Break the application layers of the Windows Server container image into smaller layers that can each be pulled and extracted more quickly. This can make Docker's layer caching more effective and make image pull retries more likely to succeed. To learn more about layers, see the Docker article About images, containers, and storage drivers.

  • Connect to your Windows Server nodes and manually use the docker pull command on your container images before creating your Pods.

  • Set the image-pull-progress-deadline flag for the kubelet service to increase the timeout for pulling container images.

    Set the flag by connecting to your Windows nodes and running the following PowerShell commands.

    1. Get the existing command line for the Kubelet service from the Windows registry.

      PS C:\> $regkey = "HKLM\SYSTEM\CurrentControlSet\Services\kubelet"
      
      PS C:\> $name = "ImagePath"
      
      PS C:\> $(reg query ${regkey} /v ${name} | Out-String) -match `
      "${name}.*(C:.*kubelet\.exe.*)\r"
      
      PS C:\> $kubelet_cmd = $Matches[1]
      
    2. Set a new command line for the Kubelet service, with an additional flag to increase the timeout.

      PS C:\> reg add ${regkey} /f /v ${name} /t REG_EXPAND_SZ /d "${kubelet_cmd} `
      --image-pull-progress-deadline=15m
      
    3. Confirm that the change was successful.

      PS C:\> reg query ${regkey} /v ${name}
      
    4. Restart the kubelet service so the new flag takes effect.

      PS C:\> Restart-Service kubelet
      
    5. Confirm that the kubelet service restarted successfully.

      PS C:\> Get-Service kubelet # ensure state is Running
      

Timeout during node pool creation

Node pool creation can time out if you are creating a large number of nodes (for example, 500) and it's the first node pool in the cluster using a Windows Server image.

To resolve this issue, reduce the number of nodes you are creating. You can increase the number of nodes later.

Using gMSA

If you have difficulty using a Group Managed Service Account (gMSA), it might be because of the computer name limit in Active Directory. Active Directory limits computer names to 15 characters. When you are using GKE, this name is derived from the GKE node name. Nodes names come from the name of the node pool, followed by a unique string. When the node's name exceeds 15 characters, it's truncated back to 15 characters.

For example, if a node is named gke-cluster-windows--node-pool-window-3d3afc34- wnnn it will be truncated to GKE-CLUSTER-WIN.

If another node is named gke-cluster-windows--node-pool-window-123gtj12-aabb it will also be truncated to GKE-CLUSTER-WIN.

Active Directory has a one-to-one relationship between computers and names, so if two computers share a name an error occurs.

To avoid this issue, rename the computer when joining Active Directory. After you have renamed the computer, you also need to update the node's kubelet and kube-proxy config to prevent an issue that stops the node being able to connect to the cluster. You can do this by appending the --hostname-override flag to the kubelet and kube-proxy services path. Set the flag to your node's instance name and restart the services.

To update the config, run the following command:

sc.exe config service-name binPath="existing-service-binpath --hostname-override=node-instance-name"

Where:

  • service-name is kubelet or kube-proxy. Run this script once per service.
  • existing-service-binpath is the binPath of the service. You can retrieve this using sc.exe qc service-name.
  • node-instance-name is the instance name of the node VM.

This resolves the issue and allows the node and host Pods to come up. It also avoids name conflicts in Active Directory.

Note on Managed Service for Microsoft Active Directory

In Managed Service for Microsoft Active Directory, setting up a gMSA requires a few extra steps. The process is slightly different because of some divergences between the default objects and permissions in Managed Microsoft AD and standard AD. Learn how to create a gMSA in Managed Microsoft AD.

Windows nodes become NotReady with error: "PLEG is not healthy"

This is a known Kubernetes issue that happens when multiple Pods are started very rapidly on a single Windows node. To recover from this situation, restart the Windows Server node. A recommended workaround to avoid this issue is to limit the rate at which Windows Pods are created to one Pod every 30 seconds.

Inconsistent TerminationGracePeriod

The Windows system timeout for the container might differ from the grace period you configure. This difference can cause Windows to force-terminate the container before the end of the grace period passed to the runtime.

You can modify the Windows timeout by editing container-local registry keys at image-build time. If you modify the Windows timeout, you might also need to adjust TerminationGracePeriodSeconds to match.

Network connectivity problems

If you experience network connectivity problems from your Windows Server containers, it might be because Windows Server container networking often assumes a network MTU of 1500, which is incompatible with Google Cloud's MTU of 1460.

Check that the MTU of the network interface in the container and the network interfaces of the Windows Server node itself are all 1460 or less. For information on how to set the MTU, see known issues for Windows containers.

What's next