In this page, you learn how to create a Google Kubernetes Engine (GKE) cluster with node pools running Microsoft Windows Server. With this cluster, you can use Windows Server containers. Microsoft Hyper-V containers are not currently supported. Similar to Linux containers, Windows Server containers provide process and namespace isolation.
A Windows Server node requires more resources than a typical Linux node. Windows Server nodes need the extra resources to run the Windows OS and for the Windows Server components that cannot run in containers. Since Windows Server nodes require more resources, your allocatable resources are lower than they would be with Linux nodes.
Creating a cluster using Windows Server node pools
In this section, you create a cluster that uses a Windows Server container.
To create this cluster you need to complete the following tasks:
- Choose your Windows Server node image.
- Update and configure
gcloud
. - Create a cluster and node pools.
- Get
kubectl
credentials. - Wait for cluster initialization.
Choose your Windows Server node image
To run on GKE, Windows Server container node images need to be built on Windows Server version 2019 (LTSC), Windows Server version 20H2 (SAC), or Windows Server version 2022 (LTSC). A single cluster can have multiple Windows Server node pools using different Windows Server versions, but each individual node pool can only use one Windows Server version.
Consider the following when choosing your node image:
- Support timing:
- The support timing for a Windows Server node image is subject to the
support timing provided by Microsoft, as described in Support policy for OS images.
You can find the support end date for GKE Windows node
images by using the
gcloud container get-server-config
command as described in the Mapping GKE and Windows versions section. - SAC versions are only supported by Microsoft for 18 months after their initial release. If you choose SAC for the image type for your node pool, but do not upgrade your node pool to newer GKE versions that target newer SAC versions, you cannot create new nodes in your node pool when the support lifecycle for the SAC version ends. Learn more about Google's support for the Windows Server operating system. We recommend using LTSC because of its longer support lifecycle.
- Do not choose SAC if you enroll your GKE cluster in the stable release channel. Since SAC versions are only supported by Microsoft for 18 months, there is a risk of the SAC node pool image becoming unsupported while the stable GKE version is still available.
- The support timing for a Windows Server node image is subject to the
support timing provided by Microsoft, as described in Support policy for OS images.
You can find the support end date for GKE Windows node
images by using the
- Version compatibility and complexity:
- Only choose SAC if you can upgrade your node pool and the containers running in it regularly. GKE periodically updates the SAC version used for Windows node pools in new GKE releases, so choosing SAC for your node pool image type requires you to rebuild your containers more often.
- If you are unsure of which Windows Server image type to use, we recommend choosing Windows Server LTSC to avoid version incompatibility problems when upgrading your node pool. For additional information, see Windows Server servicing channels: LTSC and SAC in Microsoft's documentation.
- Both Windows Server Core and Nano Server can be used as a base image for your containers.
- Windows Server containers have important version compatibility requirements:
- Windows Server containers built for LTSC do not run on SAC nodes, and vice-versa.
- Windows Server containers built for a specific LTSC or SAC version do not run on other LTSC or SAC versions without being rebuilt to target the other version.
- Building your Windows Server container images as multi-arch images that can target multiple Windows Server versions can help you manage this versioning complexity.
- New features:
- New Windows Server features are typically introduced into SAC versions first. Because of this, new GKE Windows functionality might be introduced in SAC node pools first.
- Consider SAC if you depend on features not yet available in the LTSC release.
Container runtime:
For both the Windows Server LTSC and SAC node images, the container runtime can be Docker or containerd. For GKE node version 1.21.1-gke.2200 and later, we recommend using the containerd runtime. For more information, see Node images.
Update and configure gcloud
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- Ensure you have the correct permission to create clusters. At minimum, you should be a Kubernetes Engine Cluster Admin.
Create a cluster and node pools
To run Windows Server containers, your cluster must have at least one Windows and one Linux node pool. You cannot create a cluster using only a Windows Server node pool. The Linux node pool is required to run critical cluster add- ons.
Because of its importance, we recommend turning on autoscaling to ensure your Linux node pool has sufficient capacity to run cluster add-ons.
gcloud
Create a cluster with the following fields:
gcloud container clusters create CLUSTER_NAME \
--enable-ip-alias \
--num-nodes=NUMBER_OF_NODES \
--cluster-version=VERSION_NUMBER \
--release-channel CHANNEL
Replace the following:
CLUSTER_NAME
: the name you choose for your cluster.--enable-ip-alias
turns on alias IP. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.NUMBER_OF_NODES
: the number of Linux nodes you create. You should provide sufficient compute resources to run cluster add-ons. This is an optional field and if omitted, uses the default value of3
.VERSION_NUMBER
: the specific cluster version you want to use, which must be 1.16.8-gke.9 or higher. If you do not specify a release channel, GKE enrolls your cluster in the most mature release channel where that version is available.CHANNEL
: the release channel to enroll the cluster in, which can be one ofrapid
,regular
,stable
, orNone
. By default, the cluster is enrolled in theregular
release channel unless at least one of the following flags is specified:--cluster-version
,--release-channel
,--no-enable-autoupgrade
, and--no-enable-autorepair
. You must specifyNone
if you choose a cluster version and do not want your cluster to be enrolled in a release channel.
Create the Windows Server node pool with the following fields:
gcloud container node-pools create NODE_POOL_NAME \
--cluster=CLUSTER_NAME \
--image-type=IMAGE_NAME \
--no-enable-autoupgrade \
--machine-type=MACHINE_TYPE_NAME \
--windows-os-version=WINDOWS_OS_VERSION
Replace the following:
NODE_POOL_NAME
: the name you choose for your Windows Server node pool.CLUSTER_NAME
: the name of the cluster you created above.IMAGE_NAME
: You can specify one of the following values:WINDOWS_LTSC_CONTAINERD
: Windows Server LTSC with containerd. This is the image type for both Windows Server 2022 and Windows Server 2019 OS imageWINDOWS_SAC_CONTAINERD
: Windows Server SAC with containerd (Unsupported after August 9, 2022)WINDOWS_LTSC
: Windows Server LTSC with DockerWINDOWS_SAC
: Windows Server SAC with Docker (Unsupported after August 9, 2022)
For more information about these node images, see the Choose your Windows node image section.
--no-enable-autoupgrade
disables node auto-upgrade. Review Upgrading Windows Server node pools before enabling.MACHINE_TYPE_NAME
: defines the machine type.n1-standard-2
is the minimum recommended machine type as Windows Server nodes require additional resources. Machine typesf1-micro
andg1-small
are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.WINDOWS_OS_VERSION
: defines the Windows OS version to use for image typeWINDOWS_LTSC_CONTAINERD
. This is an optional flag. When not specified, the default OS version used will be LTSC2019. Set the value toltsc2022
to create a Windows Server 2022 node pool. Set the value toltsc2019
to create a Windows Server 2019 node pool.
The following example shows how you can create a Windows Server 2022 node pool:
gcloud container node-pools create node_pool_name \
--cluster=cluster_name \
--image-type=WINDOWS_LTSC_CONTAINERD \
--windows-os-version=ltsc2022
The following example shows how you can update an existing Windows node pool to use Windows Server 2022 OS image:
gcloud container node-pools create node_pool_name \
--cluster=cluster_name \
--windows-os-version=ltsc2022
Console
Go to the Google Kubernetes Engine page in the Google Cloud console.
Click add_box Create.
In the Cluster basics section, complete the following:
- Enter the Name for your cluster.
- For the Location type, select the desired region or zone for your cluster.
- Under Control plane Version, select a Release channel or choose to specify a Static version. The static version must be 1.16.8-gke.9 or higher.
From the navigation pane, under Node Pools, click default-pool to create your Linux node pool. When configuring this node pool, you should provide sufficient compute resources to run cluster add-ons. You must also have available resource quota for the nodes and their resources (such as firewall routes).
At the top of the page, click add_box Add Node Pool to create your Windows Server node pool.
In the Node pool details section, complete the following:
- Enter a Name for the node pool.
- For static version nodes, choose the Node version.
- Enter the Number of nodes to create in the node pool.
From the navigation pane, under Node Pools, click Nodes.
From the Image type drop-down list, select one of the following node images:
- Windows Long Term Servicing Channel with Docker
- Windows Long Term Servicing Channel with containerd
- Windows Semi-Annual Channel with Docker
- Windows Semi-Annual Channel with containerd
For more information, see the Choose your Windows node image section.
Choose the default Machine configuration to use for the instances.
n1-standard-2
is the minimum recommended size as Windows Server nodes require additional resources. Machine typesf1-micro
andg1-small
are not supported. Each machine type is billed differently. For more information, refer to the machine type price sheet.
From the navigation pane, select the name of your Windows Server node pool. This returns you to the Node pool details page.
- Under Automation, clear the Enable node auto-upgrade checkbox. Review the Upgrading Windows Server node pools section before enabling auto-upgrade.
From the navigation pane, under Cluster, select Networking.
- Under Advanced networking options, ensure Enable VPC-native traffic routing (uses alias IP) is selected. Alias IP is required for Windows Server nodes. To read more about its benefits, see Understanding native container routing with Alias IPs.
Click Create.
Terraform
To create a GKE Standard cluster and a Windows Server node pool using Terraform, refer to the following example:
This example uses Windows Server LTSC with containerd. This is the image type for both Windows Server 2022 and Windows Server 2019 OS image. For more information about node images, see Choose your Windows node image.
To learn more about using Terraform, see Terraform support for GKE.
After you create a Windows Server node pool, the cluster goes into a RECONCILE
state for several minutes as the control plane is updated.
Get kubectl credentials
Use the get-credentials
command to enable kubectl
to work with the cluster you
created.
gcloud container clusters get-credentials CLUSTER_NAME
For more information on the get-credentials
command, see the SDK
get-credentials
documentation.
Wait for cluster initialization
Before using the cluster, wait for several seconds until
windows.config.common-webhooks.networking.gke.io
is created. This webhook adds
scheduling tolerations to Pods created with the kubernetes.io/os: windows
node selector to ensure they are allowed to run on Windows Server nodes. It also
validates the Pod to ensure that it only uses features supported on Windows.
To ensure the webhook is created, run the following command:
kubectl get mutatingwebhookconfigurations
The output should show the webhook running:
NAME CREATED AT
windows.config.common-webhooks.networking.gke.io 2019-12-12T16:55:47Z
Now that you have a cluster with two node pools (one Linux and one Windows), you can deploy a Windows application.
Mapping GKE and Windows versions
Microsoft releases new SAC versions approximately every six months and new LTSC versions every two to three years. These new versions are typically available in new GKE minor versions. Within a GKE minor version the LTSC and SAC versions usually remain fixed.
To see the version mapping between GKE versions and Windows
Server versions, use the gcloud beta container get-server-config
command:
gcloud beta container get-server-config
The version mapping is returned in the windowsVersionMaps
field of the
response. To filter the response to see the version mapping for specific
GKE versions in your cluster, perform the following steps in a
Linux shell or in Cloud Shell.
Set the following variables:
CLUSTER_NAME=CLUSTER_NAME NODE_POOL_NAME=NODE_POOL_NAME ZONE=COMPUTE_ZONE
Replace the following:
CLUSTER_NAME
: the name of your cluster.NODE_POOL_NAME
: the name of the Windows Server node pool.COMPUTE_ZONE
: the compute zone for the cluster.
Obtain the node pool version and store it in the
NODE_POOL_VERSION
variable:NODE_POOL_VERSION=`gcloud container node-pools describe $NODE_POOL_NAME \ --cluster $CLUSTER_NAME --zone $ZONE --format="value(version)"`
Obtain the Windows Server versions for
NODE_POOL_VERSION
:gcloud beta container get-server-config \ --format="yaml(windowsVersionMaps.\"$NODE_POOL_VERSION\")"
The output is similar to the following:
windowsVersionMaps: 1.18.6-gke.6601: windowsVersions: - imageType: WINDOWS_SAC osVersion: 10.0.18363.1198 supportEndDate: day: 10 month: 5 year: 2022 - imageType: WINDOWS_LTSC osVersion: 10.0.17763.1577 supportEndDate: day: 9 month: 1 year: 2024
Obtain the Windows Server version for the
WINDOWS_SAC
image type:gcloud beta container get-server-config \ --flatten=windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions \ --filter="windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.imageType=WINDOWS_SAC" \ --format="value(windowsVersionMaps.\"$NODE_POOL_VERSION\".windowsVersions.osVersion)"
The output is similar to the following:
10.0.18363.1198
Upgrading Windows Server node pools
The Windows Server container version compatibility requirements mean that your container images might need to be rebuilt to match the Windows Server version for a new GKE version before upgrading your node pools.
To ensure that your container images remain compatible with your nodes, we recommend that you check the version mapping and build your Windows Server container images as multi-arch images that can target multiple Windows Server versions. You can then update your container deployments to target the multi-arch images that will work on both the current and the next GKE version before manually invoking a GKE node pool upgrade. Manual node pool upgrades must be performed regularly because nodes cannot be more than two minor versions behind the control plane version.
We recommend that you subscribe to upgrade notifications using Pub/Sub to proactively receive updates about new GKE versions and the Windows OS versions they use.
We recommend enabling node auto-upgrades only if you continuously build multi-arch Windows Server container images that target the latest Windows Server versions, especially if you are using Windows Server SAC as the node image type. Node auto-upgrades are less likely to cause problems with the Windows Server LTSC node image type but there is still a risk of encountering version incompatibility issues.
Windows Updates
Windows Updates are disabled for Windows Server nodes. Automatic updates can cause node restarts at unpredictable times, and any Windows Updates installed after a node starts would be lost when the node is recreated by GKE. GKE makes Windows Updates available by periodically updating the Windows Server node images used in new GKE releases. There can be a delay between when Windows Updates are released by Microsoft and when they are available in GKE. When critical security updates are released, GKE updates the Windows Server node images as quickly as possible.
Control how Windows Pods and Services communicate
You can control how Windows Pods and Services communicate using network policies.
You can have a Windows Server container on clusters that have
network policy enabled in GKE versions 1.22.2 and later. This
feature is available for clusters that use the WINDOWS_LTSC
or
WINDOWS_LTSC_CONTAINERD
node image types.
If your control planes or nodes are running earlier versions, you can migrate
your node pools to a version that supports network policy by upgrading your node
pools and your control plane to GKE version 1.22.2 or later.
This option is only available if you created your cluster with the
--enable-dataplane-v2
flag.
After you enable network policy, all previously configured policies, including policies that did not work on Windows Server containers before you enabled the feature, become active.
Some clusters cannot be used with Windows Server containers on clusters with network policy enabled. See the limitations section for more details.
Viewing and querying logs
Logging is enabled automatically in GKE clusters. You can view the logs of the containers and the logs from other services on the Windows Server nodes using Kubernetes Engine monitoring.
The following is an example of a filter to get the container log:
resource.type="k8s_container"
resource.labels.cluster_name="your_cluster_name"
resource.labels.namespace_name="your_namespace_id"
resource.labels.container_name="your_container_name"
resource.labels.Pod_name="your_Pod_name"
Accessing a Windows Server node using Remote Desktop Protocol (RDP)
You can connect to a Windows Server node in your cluster using RDP. For instructions on how to connect, see Connecting to Windows instances in the Compute Engine documentation.
Building multi-arch images
You can build the multi-arch images manually or use a Cloud Build builder. For instructions, see Building Windows multi-arch images.
Using gMSA
The following steps show you how to use a Group Managed Service Account (gMSA) with your Windows Server node pools.
Configure Windows Server nodes in your cluster to automatically join your AD domain. For instructions, see Configure Windows Server nodes to automatically join an Active Directory domain.
Create and grant a gMSA access to the security group automatically created by the domain join service. This step needs to be done in a machine with administrative access to your AD domain.
$instanceGroupUri = gcloud container node-pools describe NODE_POOL_NAME --cluster CLUSTER_NAME --format="value(instanceGroupUrls)" $securityGroupName = ([System.Uri]$instanceGroupUri).Segments[-1] $securityGroup = dsquery group -name $securityGroupName $gmsaName = GMSA_NAME $dnsHostName = DNS_HOST_NAME New-ADServiceAccount -Name $gmsaName -DNSHostName $dnsHostName -PrincipalsAllowedToRetrieveManagedPassword $securityGroup Get-ADServiceAccount $gmsaName Test-ADServiceAccount $gmsaName
Replace the following:
NODE_POOL_NAME
: the name of your Windows Server node pool. The automatically created security group has the same name as your Windows Server node pool.CLUSTER_NAME
: the name of your cluster.GMSA_NAME
: the name you choose for the new gMSA.DNS_HOST_NAME
: the Fully Qualified Domain Name (FQDN) of the service account you created. For example, ifGMSA_NAME
iswebapp01
and the domain isexample.com
, thenDNS_HOST_NAME
iswebapp01.example.com
.
Configure your gMSA by following the instructions in the Configure GMSA for Windows Pods and containers tutorial.
Deleting Windows Server node pools
Delete a Windows Server node pool by using gcloud
or the Google Cloud console.
gcloud
gcloud container node-pools delete NODE_POOL_NAME \
--cluster=CLUSTER_NAME
Console
To delete a Windows Server node pool using the Google Cloud console, perform the following steps:
Go to the Google Kubernetes Engine page in the Google Cloud console.
Beside the cluster you want to edit, click more_vert Actions, then click edit Edit.
Select the Nodes tab.
Under the Node Pools section, click delete Delete next to the node pool you want to delete.
When prompted to confirm, click Delete again.
Limitations
There are some Kubernetes features that are not yet supported for Windows Server containers. In addition, some features are Linux-specific and do not work for Windows. For the complete list of supported and unsupported Kubernetes features, see the Kubernetes documentation.
In addition to the unsupported Kubernetes features, there are some GKE features that are not supported.
For GKE clusters, the following features are not supported with Windows Server node pools:
- Cloud TPUs (
--enable-tpu
) - Image streaming
- Intranode visibility
(
--enable-intra-node-visibility
) - IP masquerade agent
- Kubernetes alpha cluster (
--enable-kubernetes-alpha
) - Node Local DNS cache
- Private use of Class E IP addresses
- Private use of public IP addresses
- Network policy logging
- Kubernetes
service.spec.sessionAffinity
- GPUs (
--accelerator
) - Setting the maximum Pods per node greater than the default limit of 110
- Filestore CSI driver
- Docker-based CloudSQL Auth proxy
- IPv4/IPv6 dual-stack networking IPv6 is not supported on Windows nodes.
Local External Traffic Policy on Windows node pool is only supported with GKE version v1.23.4-gke.400 or later.
Other Google Cloud products that you want to use with GKE clusters might not support Windows Server node pools. For specific limitations, refer to the documentation of that product.
Troubleshooting
See the Kubernetes documentation for general guidance on debugging Pods and Services.
Containerd node issues
For known issues using a Containerd node image, see Known issues.
Windows Pods fail to start
A version mismatch between the Windows Server container and the Windows node that is trying to run the container can result in your Windows Pods failing to start.
If the version for your Windows node pool is 1.16.8-gke.8 or later, review
Microsoft's documentation for the
February 2020 Windows Server container incompatibility issue
and build your container images with
base Windows images
that include Windows Updates from March 2020. Container images built on earlier
base Windows images might fail to run on these Windows nodes and can also cause
the node to fail with status NotReady
.
Image pull errors
Windows Server container images, and the individual layers they are composed of, can be quite large. Their size can cause Kubelet to timeout and fail when downloading and extracting the container layers.
You might have encountered this problem if you see the "Failed to pull image" or "Image
pull context cancelled" error messages or an ErrImagePull
status for your
Pods.
If the pull image occurs frequently, you should use node pools with a higher CPU specification. Container extraction is executed in parallel across cores, so machine types with more cores reduces the overall pull time.
Try the following options to successfully pull your Windows Server containers:
Break the application layers of the Windows Server container image into smaller layers that can each be pulled and extracted more quickly. This can make Docker's layer caching more effective and make image pull retries more likely to succeed. To learn more about layers, see the Docker article About images, containers, and storage drivers.
Connect to your Windows Server nodes and manually use the
docker pull
command on your container images before creating your Pods.Set the
image-pull-progress-deadline
flag for thekubelet
service to increase the timeout for pulling container images.Set the flag by connecting to your Windows nodes and running the following PowerShell commands.
Get the existing command line for the Kubelet service from the Windows registry.
PS C:\> $regkey = "HKLM\SYSTEM\CurrentControlSet\Services\kubelet"
PS C:\> $name = "ImagePath"
PS C:\> $(reg query ${regkey} /v ${name} | Out-String) -match ` "(?s)${name}.*(C:.*kubelet\.exe.*)"
PS C:\> $kubelet_cmd = $Matches[1] -replace ` "--image-pull-progress-deadline=.* ","" -replace "\r\n"," "
Set a new command line for the Kubelet service, with an additional flag to increase the timeout.
PS C:\> reg add ${regkey} /f /v ${name} /t REG_EXPAND_SZ /d "${kubelet_cmd} ` --image-pull-progress-deadline=40m "
Confirm that the change was successful.
PS C:\> reg query ${regkey} /v ${name}
Restart the
kubelet
service so the new flag takes effect.PS C:\> Restart-Service kubelet
Confirm that the
kubelet
service restarted successfully.PS C:\> Get-Service kubelet # ensure state is Running
Image family reached end of life
When creating a node pool with a Windows image, you receive an error similar to the following:
WINDOWS_SAC image family for 1.18.20-gke.501 has reached end of life, newer versions are still available.
To resolve this error, choose a Windows image that is available and supported.
You can find the support end date for GKE Windows node images by
using the gcloud container get-server-config
command as described in the
Mapping GKE and Windows versions section.
Timeout during node pool creation
Node pool creation can time out if you are creating a large number of nodes (for example, 500) and it's the first node pool in the cluster using a Windows Server image.
To resolve this issue, reduce the number of nodes you are creating. You can increase the number of nodes later.
Windows nodes become NotReady
with error: "PLEG is not healthy"
This is a known Kubernetes issue that happens when multiple Pods are started very rapidly on a single Windows node. To recover from this situation, restart the Windows Server node. A recommended workaround to avoid this issue is to limit the rate at which Windows Pods are created to one Pod every 30 seconds.
Inconsistent TerminationGracePeriod
The Windows system timeout for the container might differ from the grace period you configure. This difference can cause Windows to force-terminate the container before the end of the grace period passed to the runtime.
You can modify the Windows timeout by editing container-local registry keys at image-build time. If you modify the Windows timeout, you might also need to adjust TerminationGracePeriodSeconds to match.
Network connectivity problems
If you experience network connectivity problems from your Windows Server containers,
it might be because Windows Server container networking often assumes a network MTU of
1500
, which is incompatible with Google Cloud's MTU of 1460
.
Check that both the MTU of the network interface in the container and the
network interfaces of the Windows Server node itself are set to the same value
(that is, 1460
or less). For information on how to set the MTU, see
known issues for Windows containers.
Node startup issues
If nodes fail to start in the cluster or fail to join the cluster successfully, review the diagnostic information provided in the node's serial port output.
Run the following command to see the serial port output:
gcloud compute instances get-serial-port-output NODE_NAME --zone=COMPUTE_ZONE
Replace the following:
NODE_NAME
: the name of the node.COMPUTE_ZONE
: the compute zone for the specific node.
Intermittently unreachable Services in Windows nodes with cluster running 1.24 or earlier
When starting Windows nodes in Kubernetes clusters with a high number of Host Network Service Load Balancer rules, there is a delay in processing the rules. Services are intermittently unreachable during the delay, which lasts around 30 seconds per rule, and the total delay can be significant if there are enough rules. To learn more, see the original issue in GitHub.
For GKE clusters running version 1.24 or earlier, with any
Windows nodes that had an event that restarted kube-proxy
—for example, node
startup, node upgrade, manual restart—any Services being reached by a Pod
running on that node will be unreachable until all rules are synced by the
component.
For GKE clusters running version 1.25 or later, this behavior is substantially improved. For details on this improvement, see the pull request in GitHub. If you are experiencing this issue, we recommend upgrading your cluster's control plane to 1.25 or later.
What's next
- Learn how to deploy a Windows application.
- Read Microsoft's short introduction on Windows containers.
- Read Microsoft's guidance on choosing the container base images.
- Read about Microsoft on Windows
container version compatibility.