Increase network traffic speed for GPU nodes


This page shows you how to increase network bandwidth for GPU nodes on Google Kubernetes Engine (GKE) clusters by using Google Virtual NIC (gVNIC).

In Autopilot clusters, nodes that run GKE version 1.30.2-gke.1023000 and later have Google Virtual NIC (gVNIC) installed automatically. The instructions on this page only apply to Standard clusters.

To increase bandwidth on CPU nodes, consider enabling Tier-1 bandwidth.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Limitations

Requirements

  • GKE nodes must use a Container-Optimized OS node image.

Enable gVNIC

You can create a cluster that has node pools that use gVNIC, create a node pool with gVNIC enabled, or update a node pool to use gVNIC.

Create a cluster

Create a cluster with node pools that use gVNIC:

gcloud container clusters create CLUSTER_NAME \
    --accelerator type=GPU_TYPE,count=AMOUNT \
    --machine-type=MACHINE_TYPE \
    --enable-gvnic

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • GPU_TYPE: the type of GPU accelerator that you use. For example, nvidia-tesla-t4.
  • AMOUNT: the number of GPUs to attach to nodes in the node pool.
  • MACHINE_TYPE: the type of machine you want to use. gVNIC is not supported on memory-optimized machine types.

Create a node pool

Create a node pool that uses gVNIC:

gcloud container node-pools create NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --enable-gvnic

Replace the following:

  • NODEPOOL_NAME: the name of a new node pool.
  • CLUSTER_NAME: the name of the existing cluster.

Update a node pool

Update a node pool to use gVNIC:

gcloud container node-pools update NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --enable-gvnic

Replace the following:

  • NODEPOOL_NAME: the name of the node pool that you want to update.
  • CLUSTER_NAME: the name of the existing cluster.

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Disable gVNIC

Update the node pool using the --no-enable-gvnic flag:

gcloud container node-pools update NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --no-enable-gvnic

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Troubleshooting

To troubleshoot gVNIC, see Troubleshooting Google Virtual NIC.

What's next