This page shows you how to increase network bandwidth for GPU nodes on Google Kubernetes Engine (GKE) clusters by using Google Virtual NIC (gVNIC).
In Autopilot clusters, nodes that run GKE version 1.30.2-gke.1023000 and later have Google Virtual NIC (gVNIC) installed automatically. The instructions on this page only apply to Standard clusters.
To increase bandwidth on CPU nodes, consider enabling Tier-1 bandwidth.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
Limitations
- Compute Engine limitations apply.
Requirements
- GKE nodes must use a Container-Optimized OS node image.
Enable gVNIC
You can create a cluster that has node pools that use gVNIC, create a node pool with gVNIC enabled, or update a node pool to use gVNIC.
Create a cluster
Create a cluster with node pools that use gVNIC:
gcloud container clusters create CLUSTER_NAME \
--accelerator type=GPU_TYPE,count=AMOUNT \
--machine-type=MACHINE_TYPE \
--enable-gvnic
Replace the following:
CLUSTER_NAME
: the name of the new cluster.GPU_TYPE
: the type of GPU accelerator that you use. For example,nvidia-tesla-t4
.AMOUNT
: the number of GPUs to attach to nodes in the node pool.MACHINE_TYPE
: the type of machine you want to use. gVNIC is not supported on memory-optimized machine types.
Create a node pool
Create a node pool that uses gVNIC:
gcloud container node-pools create NODEPOOL_NAME \
--cluster=CLUSTER_NAME \
--enable-gvnic
Replace the following:
NODEPOOL_NAME
: the name of a new node pool.CLUSTER_NAME
: the name of the existing cluster.
Update a node pool
Update a node pool to use gVNIC:
gcloud container node-pools update NODEPOOL_NAME \
--cluster=CLUSTER_NAME \
--enable-gvnic
Replace the following:
NODEPOOL_NAME
: the name of the node pool that you want to update.CLUSTER_NAME
: the name of the existing cluster.
This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
Disable gVNIC
Update the node pool using the --no-enable-gvnic
flag:
gcloud container node-pools update NODEPOOL_NAME \
--cluster=CLUSTER_NAME \
--no-enable-gvnic
This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
Troubleshooting
To troubleshoot gVNIC, see Troubleshooting Google Virtual NIC.
What's next
- Use network policy logging to record when connections to Pods are allowed or denied by your cluster's network policies.