Creating a cluster using Windows Server node pools


In this page, you learn how to create a Google Kubernetes Engine (GKE) cluster with node pools running Microsoft Windows Server. With this cluster, you can use Windows Server containers. Microsoft Hyper-V containers are not currently supported. Similar to Linux containers, Windows Server containers provide process and namespace isolation.

A Windows Server node requires more resources than a typical Linux node. Windows Server nodes need the extra resources to run the Windows OS and for the Windows Server components that cannot run in containers. Since Windows Server nodes require more resources, your allocatable resources are lower than they would be with Linux nodes.

Creating a cluster using Windows Server node pools

In this section, you create a cluster that uses a Windows Server container.

To create this cluster you need to complete the following tasks:

  1. Choose your Windows Server node image.
  2. Update and configure gcloud.
  3. Create a cluster and node pools.
  4. Get kubectl credentials.
  5. Wait for cluster initialization.

Choose your Windows Server node image

To run on GKE, Windows Server container node images need to be built on Windows Server version 2019 (LTSC) or Windows Server version 1909 (SAC). A single cluster can have multiple Windows Server node pools using different Windows Server versions, but each individual node pool can only use one Windows Server version.

Consider the following when choosing your node image:

  • Support timing:
    • The support timing for a Windows Server node image is subject to the support timing provided by Microsoft, as described in Support policy for OS images. You can find the support end date for GKE Windows node images by using the gcloud container get-server-config command as described in the Mapping GKE and Windows versions section.
    • SAC versions are only supported by Microsoft for 18 months after their initial release. If you choose SAC for the image type for your node pool, but do not upgrade your node pool to newer GKE versions that target newer SAC versions, you cannot create new nodes in your node pool when the support lifecycle for the SAC version ends. Learn more about Google's support for the Windows Server operating system. We recommend using LTSC because of its longer support lifecycle.
    • Do not choose SAC if you enroll your GKE cluster in the stable release channel. Since SAC versions are only supported by Microsoft for 18 months, there is a risk of the SAC node pool image becoming unsupported while the stable GKE version is still available.
  • Version compatibility and complexity:
    • Only choose SAC if you can upgrade your node pool and the containers running in it regularly. GKE periodically updates the SAC version used for Windows node pools in new GKE releases, so choosing SAC for your node pool image type requires you to rebuild your containers more often.
    • If you are unsure of which Windows Server image type to use, we recommend choosing Windows Server LTSC to avoid version incompatibility problems when upgrading your node pool. For additional information, see Windows Server servicing channels: LTSC and SAC in Microsoft's documentation.
    • Both Windows Server Core and Nano Server can be used as a base image for your containers.
    • Windows Server containers have important version compatibility requirements:
      • Windows Server containers built for LTSC do not run on SAC nodes, and vice-versa.
      • Windows Server containers built for a specific LTSC or SAC version do not run on other LTSC or SAC versions without being rebuilt to target the other version.
    • Building your Windows Server container images as multi-arch images that can target multiple Windows Server versions can help you manage this versioning complexity.
  • New features:
    • New Windows Server features are typically introduced into SAC versions first. Because of this, new GKE Windows functionality might be introduced in SAC node pools first.
    • Consider SAC if you depend on features not yet available in the LTSC release.
  • Container runtime:
    • For both the Windows Server LTSC and SAC node images, the container runtime can be Docker or containerd. For GKE node version 1.21.1-gke.2200 and later, we recommend using the containerd runtime. For more information, see Node images.

Update and configure gcloud

Before you start, make sure you have performed the following tasks:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set up default gcloud command-line tool settings for your project by using one of the following methods:
    • Use gcloud init, if you want to be walked through setting project defaults.
    • Use gcloud config, to individually set your project ID, zone, and region.

    gcloud init

    1. Run gcloud init and follow the directions:

      gcloud init

      If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

      gcloud init --console-only
    2. Follow the instructions to authorize the gcloud tool to use your Google Cloud account.
    3. Create a new configuration or select an existing one.
    4. Choose a Google Cloud project.
    5. Choose a default Compute Engine zone.
    6. Choose a default Compute Engine region.

    gcloud config

    1. Set your default project ID:
      gcloud config set project PROJECT_ID
    2. Set your default Compute Engine region (for example, us-central1):
      gcloud config set compute/region COMPUTE_REGION
    3. Set your default Compute Engine zone (for example, us-central1-c):
      gcloud config set compute/zone COMPUTE_ZONE
    4. Update gcloud to the latest version:
      gcloud components update

    By setting default locations, you can avoid errors in gcloud tool like the following: One of [--zone, --region] must be supplied: Please specify location.

Create a cluster and node pools

To run Windows Server containers, your cluster must have at least one Windows and one Linux node pool. You cannot create a cluster using only a Windows Server node pool. The Linux node pool is required to run critical cluster add- ons.

Because of its importance, we recommend turning on autoscaling to ensure your Linux node pool has sufficient capacity to run cluster add-ons.

gcloud

Create a cluster with the following fields:

gcloud container clusters create CLUSTER_NAME \
    --enable-ip-alias \
    --num-nodes=NUMBER_OF_NODES \
    --cluster-version=VERSION_NUMBER \
    --release-channel CHANNEL

Replace the following:

  • CLUSTER_NAME: the name you choose for your cluster.
  • --enable-ip-alias turns on alias IP. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  • NUMBER_OF_NODES: the number of Linux nodes you create. You should provide sufficient compute resources to run cluster add-ons. This is an optional field and if omitted, uses the default value of 3.
  • VERSION_NUMBER: the specific cluster version you want to use, which must be 1.16.8-gke.9 or higher. You can also choose to use the --release-channel flag to select a release channel.
  • CHANNEL: the release channel to enroll the cluster in, which can be one of rapid, regular, stable, or None. By default, the cluster is enrolled in the regular release channel if the following flags aren't specified: --cluster-version, --release-channel, --no-enable-autoupgrade, and --no-enable-autorepair.

Create the Windows Server node pool with the following fields:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --image-type=IMAGE_NAME \
    --no-enable-autoupgrade \
    --machine-type=MACHINE_TYPE_NAME

Replace the following:

  • NODE_POOL_NAME: the name you choose for your Windows Server node pool.
  • CLUSTER_NAME: the name of the cluster you created above.
  • IMAGE_NAME: You can specify one of the following values:

    • WINDOWS_LTSC_CONTAINERD: Windows Server LTSC with containerd
    • WINDOWS_SAC_CONTAINERD: Windows Server SAC with containerd
    • WINDOWS_LTSC: Windows Server LTSC with Docker
    • WINDOWS_SAC: Windows Server SAC with Docker

    For more information about these node images, see the Choose your Windows node image section.

  • --no-enable-autoupgrade disables node auto-upgrade. Review Upgrading Windows Server node pools before enabling.

  • MACHINE_TYPE_NAME: defines the machine type. n1-standard-2 is the minimum recommended machine type as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

Console

  1. Go to the Google Kubernetes Engine page in the Cloud Console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. In the Cluster basics section, complete the following:

    1. Enter the Name for your cluster.
    2. For the Location type, select the desired region or zone for your cluster.
    3. Under Control plane Version, select a Release channel or choose to specify a Static version. The static version must be 1.16.8-gke.9 or higher.
  4. From the navigation pane, under Node Pools, click default-pool to create your Linux node pool. When configuring this node pool, you should provide sufficient compute resources to run cluster add-ons. You must also have available resource quota for the nodes and their resources (such as firewall routes).

  5. At the top of the page, click Add Node Pool to create your Windows Server node pool.

  6. In the Node pool details section, complete the following:

    1. Enter a Name for the node pool.
    2. For static version nodes, choose the Node version.
    3. Enter the Number of nodes to create in the node pool.
  7. From the navigation pane, under Node Pools, click Nodes.

    1. From the Image type drop-down list, select one of the following node images:

      • Windows Long Term Servicing Channel with Docker
      • Windows Long Term Servicing Channel with containerd
      • Windows Semi-Annual Channel with Docker
      • Windows Semi-Annual Channel with containerd

      For more information, see the Choose your Windows node image section.

    2. Choose the default Machine configuration to use for the instances. n1-standard-2 is the minimum recommended size as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

  8. From the navigation pane, select the name of your Windows Server node pool. This returns you to the Node pool details page.

    1. Under Automation, clear the Enable node auto-upgrade checkbox. Review the Upgrading Windows Server node pools section before enabling auto-upgrade.
  9. From the navigation pane, under Cluster, select Networking.

    1. Under Advanced networking options, ensure Enable VPC-native traffic routing (uses alias IP) is selected. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
  10. Click Create.

Terraform

You can use the Google Terraform provider to create a GKE cluster with a Windows Server node pool.

Add this block to your Terraform configuration:

resource "google_container_cluster" "cluster" {
  project  = "PROJECT_ID"
  name     = "CLUSTER_NAME"
  location = "LOCATION"

  min_master_version = "VERSION_NUMBER"

  # Enable Alias IPs to allow Windows Server networking.
  ip_allocation_policy {
    cluster_ipv4_cidr_block  = "/14"
    services_ipv4_cidr_block = "/20"
  }

  # Removes the implicit default node pool, recommended when using
  # google_container_node_pool.
  remove_default_node_pool = true
  initial_node_count       = 1
}

# Small Linux node pool to run some Linux-only Kubernetes Pods.
resource "google_container_node_pool" "linux_pool" {
  name       = "linux-pool"
  project    = google_container_cluster.cluster.project
  cluster    = google_container_cluster.cluster.name
  location   = google_container_cluster.cluster.location
  node_count = 1

  node_config {
    image_type = "COS_CONTAINERD"
  }
}

# Node pool of Windows Server machines.
resource "google_container_node_pool" "windows_pool" {
  name       = "NODE_POOL_NAME"
  project    = google_container_cluster.cluster.project
  cluster    = google_container_cluster.cluster.name
  location   = google_container_cluster.cluster.location
  node_count = 1

  node_config {
    image_type   = "IMAGE_NAME"
    machine_type = "MACHINE_TYPE_NAME"
  }

  # The Linux node pool must be created before the Windows Server node pool.
  depends_on = [google_container_node_pool.linux_pool]
}

Replace the following:

  • PROJECT_ID: the project ID in which the cluster is created.
  • CLUSTER_NAME: the name of the GKE cluster.
  • LOCATION: the location (region or zone) in which the cluster is created.
  • VERSION_NUMBER: must be 1.16.8-gke.9 or higher.
  • NODE_POOL_NAME: the name you choose for your Windows Server node pool.
  • IMAGE_NAME: You can specify one of the following values:

    • WINDOWS_LTSC_CONTAINERD: Windows Server SAC with containerd
    • WINDOWS_SAC_CONTAINERD: Windows Server LTSC with containerd
    • WINDOWS_LTSC: Windows Server LTSC with Docker
    • WINDOWS_SAC: Windows Server SAC with Docker

    For more information about these node images, see the Choose your Windows node image section.

  • MACHINE_TYPE_NAME: defines the machine type. n1-standard-2 is the minimum recommended machine type as Windows Server nodes require additional resources. Machine types f1-micro and g1-small are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.

After you create a Windows Server node pool, the cluster goes into a RECONCILE state for several minutes as the control plane (master) is updated.

Get kubectl credentials

Use the get-credentials command to enable kubectl to work with the cluster you created.

gcloud container clusters get-credentials CLUSTER_NAME

For more information on the get-credentials command, see the SDK get-credentials documentation.

Wait for cluster initialization

Before using the cluster, wait for several seconds until windows.config.common-webhooks.networking.gke.io is created. This webhook adds scheduling tolerations to Pods created with the kubernetes.io/os: windows node selector to ensure they are allowed to run on Windows Server nodes. It also validates the Pod to ensure that it only uses features supported on Windows.

To ensure the webhook is created, run the following command:

kubectl get mutatingwebhookconfigurations

The output should show the webhook running:

NAME                                              CREATED AT
windows.config.common-webhooks.networking.gke.io  2019-12-12T16:55:47Z

Now that you have a cluster with two node pools (one Linux and one Windows), you can deploy a Windows application.

Mapping GKE and Windows versions

Microsoft releases new SAC versions approximately every six months and new LTSC versions every two to three years. These new versions are typically available in new GKE minor versions. Within a GKE minor version the LTSC and SAC versions usually remain fixed.

To see the version mapping between GKE versions and Windows Server versions, use the gcloud beta container get-server-config command:

gcloud beta container get-server-config

The version mapping is returned in the windowsVersionMaps field of the response. To filter the response to see the version mapping for specific GKE versions in your cluster, perform the following steps in a Linux shell or in Cloud Shell.

  1. Set the following variables:

    CLUSTER_NAME=CLUSTER_NAME
    NODE_POOL_NAME=NODE_POOL_NAME
    ZONE=COMPUTE_ZONE
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster.
    • NODE_POOL_NAME: the name of the Windows Server node pool.
    • COMPUTE_ZONE: the compute zone for the cluster.
  2. Obtain the node pool version and store it in the NODE_POOL_VERSION variable:

    NODE_POOL_VERSION=`gcloud container node-pools describe $NODE_POOL_NAME \
    --cluster $CLUSTER_NAME --zone $ZONE --format="value(version)"`
    
  3. Obtain the Windows Server versions for NODE_POOL_VERSION:

    gcloud beta container get-server-config \
        --format="yaml(windowsVersionMaps.\"$NODE_POOL_VERSION\")"
    

    The output is similar to the following:

    windowsVersionMaps:
      1.18.6-gke.6601:
        windowsVersions:
        - imageType: WINDOWS_SAC
          osVersion: 10.0.18363.1198
          supportEndDate:
            day: 10
            month: 5
            year: 2022
        - imageType: WINDOWS_LTSC
          osVersion: 10.0.17763.1577
          supportEndDate:
            day: 9
            month: 1
            year: 2024
    
  4. Obtain the Windows Server version for the WINDOWS_SAC image type:

    gcloud beta container get-server-config \
      --flatten=windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions \
      --filter="windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.imageType=WINDOWS_SAC" \
      --format="value(windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.osVersion)"
    

    The output is similar to the following:

    10.0.18363.1198
    

Upgrading Windows Server node pools

The Windows Server container version compatibility requirements mean that your container images might need to be rebuilt to match the Windows Server version for a new GKE version before upgrading your node pools.

To ensure that your container images remain compatible with your nodes, we recommend that you check the version mapping and build your Windows Server container images as multi-arch images that can target multiple Windows Server versions. You can then update your container deployments to target the multi-arch images that will work on both the current and the next GKE version before manually invoking a GKE node pool upgrade. Manual node pool upgrades must be performed regularly because nodes cannot be more than two minor versions behind the control plane version.

We recommend that you subscribe to upgrade notifications using Pub/Sub to proactively receive updates about new GKE versions and the Windows OS versions they use.

We recommend enabling node auto-upgrades only if you continuously build multi-arch Windows Server container images that target the latest Windows Server versions, especially if you are using Windows Server SAC as the node image type. Node auto-upgrades are less likely to cause problems with the Windows Server LTSC node image type but there is still a risk of encountering version incompatibility issues.

Windows Updates

Windows Updates are disabled for Windows Server nodes. Automatic updates can cause node restarts at unpredictable times, and any Windows Updates installed after a node starts would be lost when the node is recreated by GKE. GKE makes Windows Updates available by periodically updating the Windows Server node images used in new GKE releases. There can be a delay between when Windows Updates are released by Microsoft and when they are available in GKE. When critical security updates are released, GKE updates the Windows Server node images as quickly as possible.

Viewing and querying logs

Logging is enabled automatically in GKE clusters. You can view the logs of the containers and the logs from other services on the Windows Server nodes using Kubernetes Engine monitoring.

The following is an example of a filter to get the container log:

resource.type="k8s_container"
resource.labels.cluster_name="your_cluster_name"
resource.labels.namespace_name="your_namespace_id"
resource.labels.container_name="your_container_name"
resource.labels.Pod_name="your_Pod_name"

Accessing a Windows Server node using Remote Desktop Protocol (RDP)

You can connect to a Windows Server node in your cluster using RDP. For instructions on how to connect, see Connecting to Windows instances in the Compute Engine documentation.

Building multi-arch images

You can build the multi-arch images manually or use a Cloud Build builder. For instructions, see Building Windows multi-arch images.

Using gMSA

The following steps show you how to use a Group Managed Service Account (gMSA) with your Windows Server node pools.

  1. Configure Windows Server nodes in your cluster to automatically join your AD domain. For instructions, see Configuring Windows Server nodes to automatically join an AD domain.

  2. Create and grant a gMSA access to the security group automatically created by the domain join service. This step needs to be done in a machine with administrative access to your AD domain.

    $instanceGroupUri = gcloud container node-pools describe NODE_POOL_NAME --cluster CLUSTER_NAME --format="value(instanceGroupUrls)"
    $securityGroupName = ([System.Uri]$instanceGroupUri).Segments[-1]
    $securityGroup = dsquery group -name $securityGroupName
    $gmsaName = GMSA_NAME
    $dnsHostName = DNS_HOST_NAME
    
    New-ADServiceAccount -Name $gmsaName -DNSHostName $dnsHostName -PrincipalsAllowedToRetrieveManagedPassword $securityGroup
    
    Get-ADServiceAccount $gmsaName
    Test-ADServiceAccount $gmsaName
    

    Replace the following:

    • NODE_POOL_NAME: the name of your Windows Server node pool. The automatically created security group has the same name as your Windows Server node pool.
    • CLUSTER_NAME: the name of your cluster.
    • GMSA_NAME: the name you choose for the new gMSA.
    • DNS_HOST_NAME: the Fully Qualified Domain Name (FQDN) of the service account you created. For example, if GMSA_NAME is webapp01 and the domain is example.com, then DNS_HOST_NAME is webapp01.example.com.
  3. Configure your gMSA by following the instructions in the Configure GMSA for Windows Pods and containers tutorial.

Deleting Windows Server node pools

Delete a Windows Server node pool by using gcloud or the Google Cloud Console.

gcloud

gcloud container node-pools delete NODE_POOL_NAME \
    --cluster=CLUSTER_NAME

Console

To delete a Windows Server node pool using Cloud Console, perform the following steps:

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Beside the cluster you want to edit, click Actions, then click Edit.

  3. Select the Nodes tab.

  4. Under the Node Pools section, click Delete next to the node pool you want to delete.

  5. When prompted to confirm, click Delete again.

Limitations

There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.

In addition to the unsupported Kubernetes features, there are some GKE features that are not supported.

For GKE clusters, the following features are not supported with Windows Server node pools:

For Windows Server node pools, the following feature is not supported:

  • GPUs (--accelerator)

Troubleshooting

See the Kubernetes documentation for general guidance on debugging Pods and Services.

Containerd node issues

For known issues using a Containerd node image, see Known issues.

Windows Pods fail to start

A version mismatch between the Windows Server container and the Windows node that is trying to run the container can result in your Windows Pods failing to start.

If the version for your Windows node pool is 1.16.8-gke.8 or later, review Microsoft's documentation for the February 2020 Windows Server container incompatibility issue and build your container images with base Windows images that include Windows Updates from March 2020. Container images built on earlier base Windows images might fail to run on these Windows nodes and can also cause the node to fail with status NotReady.

Image pull errors

Windows Server container images, and the individual layers they are composed of, can be quite large. Their size can cause Kubelet to timeout and fail when downloading and extracting the container layers.

You might have encountered this problem if you see the "Failed to pull image" or "Image pull context cancelled" error messages or an ErrImagePull status for your Pods.

If the pull image occurs frequently, you should use node pools with a higher CPU specification. Container extraction is executed in parallel across cores, so machine types with more cores reduces the overall pull time.

Try the following options to successfully pull your Windows Server containers:

  • Break the application layers of the Windows Server container image into smaller layers that can each be pulled and extracted more quickly. This can make Docker's layer caching more effective and make image pull retries more likely to succeed. To learn more about layers, see the Docker article About images, containers, and storage drivers.

  • Connect to your Windows Server nodes and manually use the docker pull command on your container images before creating your Pods.

  • Set the image-pull-progress-deadline flag for the kubelet service to increase the timeout for pulling container images.

    Set the flag by connecting to your Windows nodes and running the following PowerShell commands.

    1. Get the existing command line for the Kubelet service from the Windows registry.

      PS C:\> $regkey = "HKLM\SYSTEM\CurrentControlSet\Services\kubelet"
      
      PS C:\> $name = "ImagePath"
      
      PS C:\> $(reg query ${regkey} /v ${name} | Out-String) -match `
      "(?s)${name}.*(C:.*kubelet\.exe.*)"
      
      PS C:\> $kubelet_cmd = $Matches[1] -replace `
      "--image-pull-progress-deadline=.* ","" -replace "\r\n"," "
      
    2. Set a new command line for the Kubelet service, with an additional flag to increase the timeout.

      PS C:\> reg add ${regkey} /f /v ${name} /t REG_EXPAND_SZ /d "${kubelet_cmd} `
      --image-pull-progress-deadline=40m "
      
    3. Confirm that the change was successful.

      PS C:\> reg query ${regkey} /v ${name}
      
    4. Restart the kubelet service so the new flag takes effect.

      PS C:\> Restart-Service kubelet
      
    5. Confirm that the kubelet service restarted successfully.

      PS C:\> Get-Service kubelet # ensure state is Running
      

Image family reached end of life

When creating a node pool with a Windows image, you receive an error similar to the following:

WINDOWS_SAC image family for 1.18.20-gke.501 has reached end of life, newer versions are still available.

To resolve this error, choose a Windows image that is available and supported. You can find the support end date for GKE Windows node images by using the gcloud container get-server-config command as described in the Mapping GKE and Windows versions section.

Timeout during node pool creation

Node pool creation can time out if you are creating a large number of nodes (for example, 500) and it's the first node pool in the cluster using a Windows Server image.

To resolve this issue, reduce the number of nodes you are creating. You can increase the number of nodes later.

Windows nodes become NotReady with error: "PLEG is not healthy"

This is a known Kubernetes issue that happens when multiple Pods are started very rapidly on a single Windows node. To recover from this situation, restart the Windows Server node. A recommended workaround to avoid this issue is to limit the rate at which Windows Pods are created to one Pod every 30 seconds.

Inconsistent TerminationGracePeriod

The Windows system timeout for the container might differ from the grace period you configure. This difference can cause Windows to force-terminate the container before the end of the grace period passed to the runtime.

You can modify the Windows timeout by editing container-local registry keys at image-build time. If you modify the Windows timeout, you might also need to adjust TerminationGracePeriodSeconds to match.

Network connectivity problems

If you experience network connectivity problems from your Windows Server containers, it might be because Windows Server container networking often assumes a network MTU of 1500, which is incompatible with Google Cloud's MTU of 1460.

Check that the MTU of the network interface in the container and the network interfaces of the Windows Server node itself are all 1460 or less. For information on how to set the MTU, see known issues for Windows containers.

Node startup issues

If nodes fail to start in the cluster or fail to join the cluster successfully, review the diagnostic information provided in the node's serial port output.

Run the following command to see the serial port output:

gcloud compute instances get-serial-port-output NODE_NAME --zone=COMPUTE_ZONE

Replace the following:

  • NODE_NAME: the name of the node.
  • COMPUTE_ZONE: the compute zone for the specific node.

What's next