Increase network traffic speed for GPU nodes


This page shows you how to increase network bandwidth for GPU nodes on Google Kubernetes Engine (GKE) clusters by using Google Virtual NIC (gVNIC).

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Limitations

Requirements

  • GKE nodes must use a Container-Optimized OS node image.

Enable gVNIC

You can create a cluster that has node pools that use gVNIC, create a node pool with gVNIC enabled, or update a node pool to use gVNIC.

Create a cluster

Create a cluster with node pools that use gVNIC:

gcloud container clusters create CLUSTER_NAME \
    --accelerator type=GPU_TYPE,count=AMOUNT \
    --machine-type=MACHINE_TYPE \
    --enable-gvnic

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • GPU_TYPE: the GPU type. Can be one of the following:
    • nvidia-tesla-k80
    • nvidia-tesla-p100
    • nvidia-tesla-p4
    • nvidia-tesla-v100
    • nvidia-tesla-t4
    • nvidia-tesla-a100
    • nvidia-a100-80gb
    • nvidia-l4
  • AMOUNT: the number of GPUs to attach to nodes in the node pool.
  • MACHINE_TYPE: the type of machine you want to use. gVNIC is not supported on memory-optimized machine types.

Create a node pool

Create a node pool that uses gVNIC:

gcloud container node-pools create NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --enable-gvnic

Replace the following:

  • NODEPOOL_NAME: the name of a new node pool.
  • CLUSTER_NAME: the name of the existing cluster.

Update a node pool

Update a node pool to use gVNIC:

gcloud container node-pools update NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --enable-gvnic

Replace the following:

  • NODEPOOL_NAME: the name of the node pool that you want to update.
  • CLUSTER_NAME: the name of the existing cluster.

Disable gVNIC

Update the node pool using the --no-enable-gvnic flag:

gcloud container node-pools update NODEPOOL_NAME \
    --cluster=CLUSTER_NAME \
    --no-enable-gvnic

Troubleshooting

To troubleshoot gVNIC, see Troubleshooting Google Virtual NIC.

What's next