Create a cluster to run container workloads

Create a user cluster to allow for container workload deployment.

Before you begin

To get the permissions needed to create a user cluster, ask your Organization IAM Admin to grant you the User Cluster Admin role (user-cluster-admin).

Google Distributed Cloud (GDC) air-gapped has the following limits for user clusters:

  • 16 user clusters per organization
  • 42 worker nodes per user cluster, and a minimum of three worker nodes
  • 4620 pods per user cluster
  • 110 pods per node

Create a user cluster

Complete the following steps to create a user cluster:

Console

  1. In the navigation menu, select Clusters.

  2. Click Create Cluster.

  3. In the Name field, specify a name for the user cluster.

  4. Select the Kubernetes version for the cluster.

  5. Click Attach Project and select an existing project to attach to your cluster. Then click Save. You can attach or detach projects after creating the cluster from the project details page. You must have a project attached to your cluster before deploying container workloads it.

    Create a cluster using the console.

  6. Click Next.

  7. Configure the network settings for your cluster. You can't change these network settings after you create the cluster. The default and only supported Internet Protocol for user clusters is Internet Protocol version 4 (IPv4).

    1. If you want to create dedicated load balancer nodes, enter the number of nodes to create. By default, you receive zero nodes, and load balancer traffic runs through the control nodes.

    2. Select the Service CIDR (Classless Inter-Domain Routing) to use. Your deployed services, such as load balancers, are allocated IP addresses from this range.

    3. Select the Pod CIDR to use. The cluster allocates IP addresses from this range to your pods and VMs.

    4. Click Next.

  8. Review the details of the auto-generated default node pool for the user cluster. Click Edit to modify the default node pool.

  9. To create additional node pools, select Add node pool. When editing the default node pool or adding a new node pool, you customize it with the following options:

    1. Assign a name for the node pool. You cannot modify the name after you create the node pool.
    2. Specify the number of worker nodes to create in the node pool.
    3. Select your machine class that best suits your workload requirements. View the list of the following settings:

      • Machine type
      • CPU
      • Memory
    4. Click Save.

  10. Click Create to create the user cluster.

API

To create a new user cluster using the API directly, apply a custom resource to your GDC instance:

  1. Create a Cluster custom resource and save it as a YAML file, such as cluster.yaml:

    apiVersion: cluster.gdc.goog/v1
    kind: Cluster
    metadata:
      name: CLUSTER_NAME
      namespace: platform
    spec:
      clusterNetwork:
        podCIDRSize: POD_CIDR
        serviceCIDRSize: SERVICE_CIDR
      initialVersion:
        kubernetesVersion: KUBERNETES_VERSION
      loadBalancer:
        ingressServiceIPSize: LOAD_BALANCER_POOL_SIZE
      nodePools:
      - machineTypeName: MACHINE_TYPE
        name: NODE_POOL_NAME
        nodeCount: NUMBER_OF_WORKER_NODES
        taints: TAINTS
        labels: LABELS
      releaseChannel:
        channel: UNSPECIFIED
    

    Replace the following:

    • CLUSTER_NAME: The name of the cluster. The cluster name must not end with -system. The -system suffix is reserved for clusters created by GDC.
    • POD_CIDR: The size of network ranges from which pod virtual IP addresses are allocated. If unset, a default value 21 is used.
    • SERVICE_CIDR: The size of network ranges from which service virtual IP addresses are allocated. If unset, a default value 23 is used.
    • KUBERNETES_VERSION: The Kubernetes version of the cluster, such as 1.26.5-gke.2100. To list the available Kubernetes versions to configure, see List available Kubernetes versions for a cluster.
    • LOAD_BALANCER_POOL_SIZE: The size of non-overlapping IP address pools used by load balancer services. If unset, a default value 20 is used.
    • MACHINE_TYPE: The machine type for the worker nodes of the node pool. View the available machine types for what is available to configure.
    • NODE_POOL_NAME: The name of the node pool.
    • NUMBER_OF_WORKER_NODES: The number of worker nodes to provision in the node pool.
    • TAINTS: The taints to apply to the nodes of this node pool. This is an optional field.
    • LABELS: The labels to apply to the nodes of this node pool. It contains a list of key-value pairs. This is an optional field.
  2. Apply the custom resource to your GDC instance:

    kubectl apply -f cluster.yaml --kubeconfig ORG_ADMIN_CLUSTER_KUBECONFIG
    

    Replace ORG_ADMIN_CLUSTER_KUBECONFIG with the org admin cluster's kubeconfig file path.

List available Kubernetes versions for a cluster

You can list the available Kubernetes versions in your GDC instance using the kubectl CLI:

kubectl get userclustermetadata.upgrade.private.gdc.goog \
    -o=custom-columns=K8S-VERSION:.spec.kubernetesVersion \
    --kubeconfig ORG_ADMIN_CLUSTER_KUBECONFIG

Replace ORG_ADMIN_CLUSTER_KUBECONFIG with the org admin cluster's kubeconfig file path.

The output looks similar to the following:

K8S-VERSION
1.25.10-gke.2100
1.26.5-gke.2100
1.27.4-gke.500

Support GPU workloads in a user cluster

Distributed Cloud provides NVIDIA GPU support for user clusters, and they run your GPU devices as user workloads. For example, you might prefer running artificial intelligence (AI) and machine learning (ML) notebooks in a GPU environment. Ensure that your user cluster supports GPU devices before leveraging AI and ML notebooks. GPU support is enabled by default for clusters who have GPU machines provisioned for them.

User clusters can be created using the GDC console or API directly. Ensure that you provision GPU machines for your user cluster to support GPU workloads on its associated containers. For more information, see Create a user cluster.

GPUs are statically allocated. The first three GPUs are always dedicated to pod workloads like pretrained Artificial Intelligence (AI) and Machine Learning (ML) APIs. These GPUs do not run on a user cluster. The remaining GPUs are allocated user clusters. AI and ML notebooks run on user clusters.

Be sure to allocate GPU machines for the correct cluster types to ensure components such as AI and ML APIs and notebooks can be used.

Supported NVIDIA GPU cards

Assess the following table for supported NVIDIA GPU cards and the machines you must provision in your user cluster to use them:

GPU card Required machine type
A100 PCIe 40 GB a2-highgpu-1g-gdc
A100 PCIe 80 GB a2-ultragpu-1g-gdc