Creating a cluster using Windows Server node pools


In this page, you learn how to create a Google Kubernetes Engine (GKE) cluster with node pools running Microsoft Windows Server. With this cluster, you can use Windows Server containers. Microsoft Hyper-V containers are not currently supported. Similar to Linux containers, Windows Server containers provide process and namespace isolation.

A Windows Server node requires more resources than a typical Linux node. Windows Server nodes need the extra resources to run the Windows OS and for the Windows Server components that cannot run in containers. Since Windows Server nodes require more resources, your allocatable resources are lower than they would be with Linux nodes.

Creating a cluster using Windows Server node pools

In this section, you create a cluster that uses a Windows Server container.

To create this cluster you need to complete the following tasks:

  1. Choose your Windows Server node image.
  2. Update and configure gcloud.
  3. Create a cluster and node pools.
  4. Get kubectl credentials.
  5. Wait for cluster initialization.

Choose your Windows Server node image

To run on GKE, Windows Server container node images need to be built on Windows Server version 2019 (LTSC), Windows Server version 20H2 (SAC), or Windows Server version 2022 (LTSC). A single cluster can have multiple Windows Server node pools using different Windows Server versions, but each individual node pool can only use one Windows Server version.

Consider the following when choosing your node image:

  • Support timing:
    • The support timing for a Windows Server node image is subject to the support timing provided by Microsoft, as described in Support policy for OS images. You can find the support end date for GKE Windows node images by using the gcloud container get-server-config command as described in the Mapping GKE and Windows versions section.
    • SAC versions are only supported by Microsoft for 18 months after their initial release. If you choose SAC for the image type for your node pool, but do not upgrade your node pool to newer GKE versions that target newer SAC versions, you cannot create new nodes in your node pool when the support lifecycle for the SAC version ends. Learn more about Google's support for the Windows Server operating system. We recommend using LTSC because of its longer support lifecycle.
    • Do not choose SAC if you enroll your GKE cluster in the stable release channel. Since SAC versions are only supported by Microsoft for 18 months, there is a risk of the SAC node pool image becoming unsupported while the stable GKE version is still available.
  • Version compatibility and complexity:
    • Only choose SAC if you can upgrade your node pool and the containers running in it regularly. GKE periodically updates the SAC version used for Windows node pools in new GKE releases, so choosing SAC for your node pool image type requires you to rebuild your containers more often.
    • If you are unsure of which Windows Server image type to use, we recommend choosing Windows Server LTSC to avoid version incompatibility problems when upgrading your node pool. For additional information, see Windows Server servicing channels: LTSC and SAC in Microsoft's documentation.
    • Both Windows Server Core and Nano Server can be used as a base image for your containers.
    • Windows Server containers have important version compatibility requirements:
      • Windows Server containers built for LTSC do not run on SAC nodes, and vice-versa.
      • Windows Server containers built for a specific LTSC or SAC version do not run on other LTSC or SAC versions without being rebuilt to target the other version.
    • Building your Windows Server container images as multi-arch images that can target multiple Windows Server versions can help you manage this versioning complexity.
  • New features:
    • New Windows Server features are typically introduced into SAC versions first. Because of this, new GKE Windows functionality might be introduced in SAC node pools first.
    • Consider SAC if you depend on features not yet available in the LTSC release.
  • Container runtime:

    • For both the Windows Server LTSC and SAC node images, the container runtime can be Docker or containerd. For GKE node version 1.21.1-gke.2200 and later, we recommend using the containerd runtime. For more information, see Node images.

Update and configure gcloud

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Create a cluster and node pools

To run Windows Server containers, your cluster must have at least one Windows and one Linux node pool. You cannot create a cluster using only a Windows Server node pool. The Linux node pool is required to run critical cluster add- ons.

Because of its importance, we recommend turning on autoscaling to ensure your Linux node pool has sufficient capacity to run cluster add-ons.

gcloud

Create a cluster with the following fields:

gcloud container clusters create CLUSTER_NAME \
    --enable-ip-alias \
    --num-nodes=NUMBER_OF_NODES \
    --cluster-version=VERSION_NUMBER \
    --release-channel CHANNEL

Replace the following:

  • CLUSTER_NAME: the name you choose for your cluster.
  • --enable-ip-alias turns on alias IP. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  • NUMBER_OF_NODES: the number of Linux nodes you create. You should provide sufficient compute resources to run cluster add-ons. This is an optional field and if omitted, uses the default value of 3.
  • VERSION_NUMBER: the specific cluster version you want to use, which must be 1.16.8-gke.9 or higher. If you do not specify a release channel, GKE enrolls your cluster in the most mature release channel where that version is available.
  • CHANNEL: the release channel to enroll the cluster in, which can be one of rapid, regular, stable, or None. By default, the cluster is enrolled in the regular release channel unless at least one of the following flags is specified: --cluster-version, --release-channel, --no-enable-autoupgrade, and --no-enable-autorepair. You must specify None if you choose a cluster version and do not want your cluster to be enrolled in a release channel.

Create the Windows Server node pool with the following fields:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --image-type=IMAGE_NAME \
    --no-enable-autoupgrade \
    --machine-type=MACHINE_TYPE_NAME \
    --windows-os-version=WINDOWS_OS_VERSION

Replace the following:

  • NODE_POOL_NAME: the name you choose for your Windows Server node pool.
  • CLUSTER_NAME: the name of the cluster you created above.
  • IMAGE_NAME: You can specify one of the following values:

    • WINDOWS_LTSC_CONTAINERD: Windows Server LTSC with containerd. This is the image type for both Windows Server 2022 and Windows Server 2019 OS image
    • WINDOWS_SAC_CONTAINERD: Windows Server SAC with containerd (Unsupported after August 9, 2022)
    • WINDOWS_LTSC: Windows Server LTSC with Docker
    • WINDOWS_SAC: Windows Server SAC with Docker (Unsupported after August 9, 2022)

    For more information about these node images, see the Choose your Windows node image section.

  • --no-enable-autoupgrade disables node auto-upgrade. Review Upgrading Windows Server node pools before enabling.

  • MACHINE_TYPE_NAME: defines the machine type. n1-standard-2 is the minimum recommended machine type as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

  • WINDOWS_OS_VERSION: defines the Windows OS version to use for image type WINDOWS_LTSC_CONTAINERD. This is an optional flag. When not specified, the default OS version used will be LTSC2019. Set the value to ltsc2022 to create a Windows Server 2022 node pool. Set the value to ltsc2019 to create a Windows Server 2019 node pool.

The following example shows how you can create a Windows Server 2022 node pool:

gcloud container node-pools create node_pool_name \
    --cluster=cluster_name \
    --image-type=WINDOWS_LTSC_CONTAINERD \
    --windows-os-version=ltsc2022

The following example shows how you can update an existing Windows node pool to use Windows Server 2022 OS image:

gcloud container node-pools create node_pool_name \
    --cluster=cluster_name \
    --windows-os-version=ltsc2022

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. In the Cluster basics section, complete the following:

    1. Enter the Name for your cluster.
    2. For the Location type, select the desired region or zone for your cluster.
    3. Under Control plane Version, select a Release channel or choose to specify a Static version. The static version must be 1.16.8-gke.9 or higher.
  4. From the navigation pane, under Node Pools, click default-pool to create your Linux node pool. When configuring this node pool, you should provide sufficient compute resources to run cluster add-ons. You must also have available resource quota for the nodes and their resources (such as firewall routes).

  5. At the top of the page, click Add Node Pool to create your Windows Server node pool.

  6. In the Node pool details section, complete the following:

    1. Enter a Name for the node pool.
    2. For static version nodes, choose the Node version.
    3. Enter the Number of nodes to create in the node pool.
  7. From the navigation pane, under Node Pools, click Nodes.

    1. From the Image type drop-down list, select one of the following node images:

      • Windows Long Term Servicing Channel with Docker
      • Windows Long Term Servicing Channel with containerd
      • Windows Semi-Annual Channel with Docker
      • Windows Semi-Annual Channel with containerd

      For more information, see the Choose your Windows node image section.

    2. Choose the default Machine configuration to use for the instances. n1-standard-2 is the minimum recommended size as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

  8. From the navigation pane, select the name of your Windows Server node pool. This returns you to the Node pool details page.

    1. Under Automation, clear the Enable node auto-upgrade checkbox. Review the Upgrading Windows Server node pools section before enabling auto-upgrade.
  9. From the navigation pane, under Cluster, select Networking.

    1. Under Advanced networking options, ensure Enable VPC-native traffic routing (uses alias IP) is selected. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  10. Click Create.

Terraform

To create a GKE Standard cluster and a Windows Server node pool using Terraform, refer to the following example:

resource "google_container_cluster" "default" {
  name     = "gke-standard-regional-cluster"
  location = "us-west1"

  initial_node_count = 1

  # Set `deletion_protection` to `true` will ensure that one cannot
  # accidentally delete this instance by use of Terraform.
  deletion_protection = false
}

resource "google_container_node_pool" "default" {
  name     = "windows-node-pool"
  cluster  = google_container_cluster.default.name
  location = google_container_cluster.default.location

  node_config {
    image_type = "WINDOWS_LTSC_CONTAINERD"
  }
}

This example uses Windows Server LTSC with containerd. This is the image type for both Windows Server 2022 and Windows Server 2019 OS image. For more information about node images, see Choose your Windows node image.

To learn more about using Terraform, see Terraform support for GKE.

After you create a Windows Server node pool, the cluster goes into a RECONCILE state for several minutes as the control plane is updated.

Get kubectl credentials

Use the get-credentials command to enable kubectl to work with the cluster you created.

gcloud container clusters get-credentials CLUSTER_NAME

For more information on the get-credentials command, see the SDK get-credentials documentation.

Wait for cluster initialization

Before using the cluster, wait for several seconds until windows.config.common-webhooks.networking.gke.io is created. This webhook adds scheduling tolerations to Pods created with the kubernetes.io/os: windows node selector to ensure they are allowed to run on Windows Server nodes. It also validates the Pod to ensure that it only uses features supported on Windows.

To ensure the webhook is created, run the following command:

kubectl get mutatingwebhookconfigurations

The output should show the webhook running:

NAME                                              CREATED AT
windows.config.common-webhooks.networking.gke.io  2019-12-12T16:55:47Z

Now that you have a cluster with two node pools (one Linux and one Windows), you can deploy a Windows application.

Mapping GKE and Windows versions

Microsoft releases new SAC versions approximately every six months and new LTSC versions every two to three years. These new versions are typically available in new GKE minor versions. Within a GKE minor version the LTSC and SAC versions usually remain fixed.

To see the version mapping between GKE versions and Windows Server versions, use the gcloud beta container get-server-config command:

gcloud beta container get-server-config

The version mapping is returned in the windowsVersionMaps field of the response. To filter the response to see the version mapping for specific GKE versions in your cluster, perform the following steps in a Linux shell or in Cloud Shell.

  1. Set the following variables:

    CLUSTER_NAME=CLUSTER_NAME
    NODE_POOL_NAME=NODE_POOL_NAME
    ZONE=COMPUTE_ZONE
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster.
    • NODE_POOL_NAME: the name of the Windows Server node pool.
    • COMPUTE_ZONE: the compute zone for the cluster.
  2. Obtain the node pool version and store it in the NODE_POOL_VERSION variable:

    NODE_POOL_VERSION=`gcloud container node-pools describe $NODE_POOL_NAME \
    --cluster $CLUSTER_NAME --zone $ZONE --format="value(version)"`
    
  3. Obtain the Windows Server versions for NODE_POOL_VERSION:

    gcloud beta container get-server-config \
        --format="yaml(windowsVersionMaps.\"$NODE_POOL_VERSION\")"
    

    The output is similar to the following:

    windowsVersionMaps:
      1.18.6-gke.6601:
        windowsVersions:
        - imageType: WINDOWS_SAC
          osVersion: 10.0.18363.1198
          supportEndDate:
            day: 10
            month: 5
            year: 2022
        - imageType: WINDOWS_LTSC
          osVersion: 10.0.17763.1577
          supportEndDate:
            day: 9
            month: 1
            year: 2024
    
  4. Obtain the Windows Server version for the WINDOWS_SAC image type:

    gcloud beta container get-server-config \
      --flatten=windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions \
      --filter="windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.imageType=WINDOWS_SAC" \
      --format="value(windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.osVersion)"
    

    The output is similar to the following:

    10.0.18363.1198
    

Upgrading Windows Server node pools

The Windows Server container version compatibility requirements mean that your container images might need to be rebuilt to match the Windows Server version for a new GKE version before upgrading your node pools.

To ensure that your container images remain compatible with your nodes, we recommend that you check the version mapping and build your Windows Server container images as multi-arch images that can target multiple Windows Server versions. You can then update your container deployments to target the multi-arch images that will work on both the current and the next GKE version before manually invoking a GKE node pool upgrade. Manual node pool upgrades must be performed regularly because nodes cannot be more than two minor versions behind the control plane version.

We recommend that you subscribe to upgrade notifications using Pub/Sub to proactively receive updates about new GKE versions and the Windows OS versions they use.

We recommend enabling node auto-upgrades only if you continuously build multi-arch Windows Server container images that target the latest Windows Server versions, especially if you are using Windows Server SAC as the node image type. Node auto-upgrades are less likely to cause problems with the Windows Server LTSC node image type but there is still a risk of encountering version incompatibility issues.

Windows Updates

Windows Updates are disabled for Windows Server nodes. Automatic updates can cause node restarts at unpredictable times, and any Windows Updates installed after a node starts would be lost when the node is recreated by GKE. GKE makes Windows Updates available by periodically updating the Windows Server node images used in new GKE releases. There can be a delay between when Windows Updates are released by Microsoft and when they are available in GKE. When critical security updates are released, GKE updates the Windows Server node images as quickly as possible.

Control how Windows Pods and Services communicate

You can control how Windows Pods and Services communicate using network policies.

You can have a Windows Server container on clusters that have network policy enabled in GKE versions 1.22.2 and later. This feature is available for clusters that use the WINDOWS_LTSC or WINDOWS_LTSC_CONTAINERD node image types.

If your control planes or nodes are running earlier versions, you can migrate your node pools to a version that supports network policy by upgrading your node pools and your control plane to GKE version 1.22.2 or later. This option is only available if you created your cluster with the --enable-dataplane-v2 flag.

After you enable network policy, all previously configured policies, including policies that did not work on Windows Server containers before you enabled the feature, become active.

Some clusters cannot be used with Windows Server containers on clusters with network policy enabled. See the limitations section for more details.

Viewing and querying logs

Logging is enabled automatically in GKE clusters. You can view the logs of the containers and the logs from other services on the Windows Server nodes using Kubernetes Engine monitoring.

The following is an example of a filter to get the container log:

resource.type="k8s_container"
resource.labels.cluster_name="your_cluster_name"
resource.labels.namespace_name="your_namespace_id"
resource.labels.container_name="your_container_name"
resource.labels.Pod_name="your_Pod_name"

Accessing a Windows Server node using Remote Desktop Protocol (RDP)

You can connect to a Windows Server node in your cluster using RDP. For instructions on how to connect, see Connecting to Windows instances in the Compute Engine documentation.

Building multi-arch images

You can build the multi-arch images manually or use a Cloud Build builder. For instructions, see Building Windows multi-arch images.

Using gMSA

The following steps show you how to use a Group Managed Service Account (gMSA) with your Windows Server node pools.

  1. Configure Windows Server nodes in your cluster to automatically join your AD domain. For instructions, see Configure Windows Server nodes to automatically join an Active Directory domain.

  2. Create and grant a gMSA access to the security group automatically created by the domain join service. This step needs to be done in a machine with administrative access to your AD domain.

    $instanceGroupUri = gcloud container node-pools describe NODE_POOL_NAME --cluster CLUSTER_NAME --format="value(instanceGroupUrls)"
    $securityGroupName = ([System.Uri]$instanceGroupUri).Segments[-1]
    $securityGroup = dsquery group -name $securityGroupName
    $gmsaName = GMSA_NAME
    $dnsHostName = DNS_HOST_NAME
    
    New-ADServiceAccount -Name $gmsaName -DNSHostName $dnsHostName -PrincipalsAllowedToRetrieveManagedPassword $securityGroup
    
    Get-ADServiceAccount $gmsaName
    Test-ADServiceAccount $gmsaName
    

    Replace the following:

    • NODE_POOL_NAME: the name of your Windows Server node pool. The automatically created security group has the same name as your Windows Server node pool.
    • CLUSTER_NAME: the name of your cluster.
    • GMSA_NAME: the name you choose for the new gMSA.
    • DNS_HOST_NAME: the Fully Qualified Domain Name (FQDN) of the service account you created. For example, if GMSA_NAME is webapp01 and the domain is example.com, then DNS_HOST_NAME is webapp01.example.com.
  3. Configure your gMSA by following the instructions in the Configure GMSA for Windows Pods and containers tutorial.

Deleting Windows Server node pools

Delete a Windows Server node pool by using gcloud or the Google Cloud console.

gcloud

gcloud container node-pools delete NODE_POOL_NAME \
    --cluster=CLUSTER_NAME

Console

To delete a Windows Server node pool using the Google Cloud console, perform the following steps:

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Beside the cluster you want to edit, click Actions, then click Edit.

  3. Select the Nodes tab.

  4. Under the Node Pools section, click Delete next to the node pool you want to delete.

  5. When prompted to confirm, click Delete again.

Limitations

There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.

In addition to the unsupported Kubernetes features, there are some GKE features that are not supported.

For GKE clusters, the following features are not supported with Windows Server node pools:

Local External Traffic Policy on Windows node pool is only supported with GKE version v1.23.4-gke.400 or later.

Other Google Cloud products that you want to use with GKE clusters might not support Windows Server node pools. For specific limitations, refer to the documentation of that product.

Troubleshooting

See the Kubernetes documentation for general guidance on debugging Pods and Services.

Containerd node issues

For known issues using a Containerd node image, see Known issues.

Windows Pods fail to start

A version mismatch between the Windows Server container and the Windows node that is trying to run the container can result in your Windows Pods failing to start.

If the version for your Windows node pool is 1.16.8-gke.8 or later, review Microsoft's documentation for the February 2020 Windows Server container incompatibility issue and build your container images with base Windows images that include Windows Updates from March 2020. Container images built on earlier base Windows images might fail to run on these Windows nodes and can also cause the node to fail with status NotReady.

Image pull errors

Windows Server container images, and the individual layers they are composed of, can be quite large. Their size can cause Kubelet to timeout and fail when downloading and extracting the container layers.

You might have encountered this problem if you see the "Failed to pull image" or "Image pull context cancelled" error messages or an ErrImagePull status for your Pods.

If the pull image occurs frequently, you should use node pools with a higher CPU specification. Container extraction is executed in parallel across cores, so machine types with more cores reduces the overall pull time.

Try the following options to successfully pull your Windows Server containers:

  • Break the application layers of the Windows Server container image into smaller layers that can each be pulled and extracted more quickly. This can make Docker's layer caching more effective and make image pull retries more likely to succeed. To learn more about layers, see the Docker article About images, containers, and storage drivers.

  • Connect to your Windows Server nodes and manually use the docker pull command on your container images before creating your Pods.

  • Set the image-pull-progress-deadline flag for the kubelet service to increase the timeout for pulling container images.

    Set the flag by connecting to your Windows nodes and running the following PowerShell commands.

    1. Get the existing command line for the Kubelet service from the Windows registry.

      PS C:\> $regkey = "HKLM\SYSTEM\CurrentControlSet\Services\kubelet"
      PS C:\> $name = "ImagePath"
      PS C:\> $(reg query ${regkey} /v ${name} | Out-String) -match `
      "(?s)${name}.*(C:.*kubelet\.exe.*)"
      PS C:\> $kubelet_cmd = $Matches[1] -replace `
      "--image-pull-progress-deadline=.* ","" -replace "\r\n"," "
    2. Set a new command line for the Kubelet service, with an additional flag to increase the timeout.

      PS C:\> reg add ${regkey} /f /v ${name} /t REG_EXPAND_SZ /d "${kubelet_cmd} `
      --image-pull-progress-deadline=40m "
    3. Confirm that the change was successful.

      PS C:\> reg query ${regkey} /v ${name}
    4. Restart the kubelet service so the new flag takes effect.

      PS C:\> Restart-Service kubelet
    5. Confirm that the kubelet service restarted successfully.

      PS C:\> Get-Service kubelet # ensure state is Running

Image family reached end of life

When creating a node pool with a Windows image, you receive an error similar to the following:

WINDOWS_SAC image family for 1.18.20-gke.501 has reached end of life, newer versions are still available.

To resolve this error, choose a Windows image that is available and supported. You can find the support end date for GKE Windows node images by using the gcloud container get-server-config command as described in the Mapping GKE and Windows versions section.

Timeout during node pool creation

Node pool creation can time out if you are creating a large number of nodes (for example, 500) and it's the first node pool in the cluster using a Windows Server image.

To resolve this issue, reduce the number of nodes you are creating. You can increase the number of nodes later.

Windows nodes become NotReady with error: "PLEG is not healthy"

This is a known Kubernetes issue that happens when multiple Pods are started very rapidly on a single Windows node. To recover from this situation, restart the Windows Server node. A recommended workaround to avoid this issue is to limit the rate at which Windows Pods are created to one Pod every 30 seconds.

Inconsistent TerminationGracePeriod

The Windows system timeout for the container might differ from the grace period you configure. This difference can cause Windows to force-terminate the container before the end of the grace period passed to the runtime.

You can modify the Windows timeout by editing container-local registry keys at image-build time. If you modify the Windows timeout, you might also need to adjust TerminationGracePeriodSeconds to match.

Network connectivity problems

If you experience network connectivity problems from your Windows Server containers, it might be because Windows Server container networking often assumes a network MTU of 1500, which is incompatible with Google Cloud's MTU of 1460.

Check that both the MTU of the network interface in the container and the network interfaces of the Windows Server node itself are set to the same value (that is, 1460 or less). For information on how to set the MTU, see known issues for Windows containers.

Node startup issues

If nodes fail to start in the cluster or fail to join the cluster successfully, review the diagnostic information provided in the node's serial port output.

Run the following command to see the serial port output:

gcloud compute instances get-serial-port-output NODE_NAME --zone=COMPUTE_ZONE

Replace the following:

  • NODE_NAME: the name of the node.
  • COMPUTE_ZONE: the compute zone for the specific node.

Intermittently unreachable Services in Windows nodes with cluster running 1.24 or earlier

When starting Windows nodes in Kubernetes clusters with a high number of Host Network Service Load Balancer rules, there is a delay in processing the rules. Services are intermittently unreachable during the delay, which lasts around 30 seconds per rule, and the total delay can be significant if there are enough rules. To learn more, see the original issue in GitHub.

For GKE clusters running version 1.24 or earlier, with any Windows nodes that had an event that restarted kube-proxy—for example, node startup, node upgrade, manual restart—any Services being reached by a Pod running on that node will be unreachable until all rules are synced by the component.

For GKE clusters running version 1.25 or later, this behavior is substantially improved. For details on this improvement, see the pull request in GitHub. If you are experiencing this issue, we recommend upgrading your cluster's control plane to 1.25 or later.

What's next