Stay organized with collections
Save and categorize content based on your preferences.
You can use the Accelerated Processing Kit (XPK)
to create pre-configured Google Kubernetes Engine (GKE) clusters for
Pathways-based workloads. You can also use gcloud to manually create
GKE clusters for Pathways-based workloads
PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
Use XPK to create a GKE Pathways cluster. This command can take several
minutes to provision the capacity. Once completed, your capacity is
allocated and you will start incurring charges.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[],[],null,["# Create a GKE Cluster with Pathways\n\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n| **Important:** To get access to Pathways, contact your Google Cloud Account representative.\n\nYou can use the [Accelerated Processing Kit (XPK)](https://github.com/AI-Hypercomputer/xpk)\nto create pre-configured Google Kubernetes Engine (GKE) clusters for\nPathways-based workloads. You can also use `gcloud` to manually create\nGKE clusters for Pathways-based workloads\n\nBefore you begin\n----------------\n\nMake sure you have:\n\n- [Installed Kubernetes tools](https://kubernetes.io/docs/tasks/tools/)\n- [Installed XPK](/ai-hypercomputer/docs/create/gke-ai-hypercompute#use-xpk)\n- [Enabled the TPU API](/tpu/docs/setup-gcp-account#set-up-env)\n- [Enabled the Google Kubernetes Engine API](/endpoints/docs/openapi/enable-api)\n- Ensure your Google Cloud project is allowlisted for Pathways\n\nSet up your local environment\n-----------------------------\n\nLog in with your Google Cloud credentials. \n\n gcloud auth application-default login\n\nDefine the following environment variables with values appropriate to your\nworkload.\n\n### Required variables\n\n| **Note:** For more information about how to create a VPC network and subnet for XPK, see [xpk-large-scale-guide.sh](https://github.com/AI-Hypercomputer/xpk/blob/main/xpk-large-scale-guide.sh).\n\nCreate a GKE cluster\n--------------------\n\nIn the following example, you create a cluster with two v5e 2x4 node pools.\nYou can create a cluster using XPK or the `gcloud` command. \n\n### XPK\n\n1. Set some environment variables\n\n ```bash\n CLUSTER_NODEPOOL_COUNT=CLUSTER_NODEPOOL_COUNT\n PROJECT=PROJECT_ID\n ZONE=ZONE\n CLUSTER=GKE_CLUSTER_NAME\n TPU_TYPE=\"\u003cvar translate=\"no\"\u003ev5litepod-8\u003c/var\u003e\"\n PW_CPU_MACHINE_TYPE=\"\u003cvar translate=\"no\"\u003en2-standard-64\u003c/var\u003e\"\n NETWORK=NETWORK\n SUBNETWORK=SUB_NETWORK\n ```\n\n Replace the following:\n - `CLUSTER_NODEPOOL_COUNT`: the maximum number of node pools a workload can use\n - `PROJECT_ID`: your Google Cloud project name\n - `ZONE`: the zone where you are creating resources\n - `CLUSTER`: the GKE cluster name\n - `TPU_TYPE`: the TPU type. For more information, see [supported types in XPK](https://github.com/AI-Hypercomputer/xpk/blob/c8f20956107c8a2b57064415d548555b11ec1413/src/xpk/core/system_characteristics.py#L100)\n - `PW_CPU_MACHINE_TYPE`: the CPU node type for the Pathways controller\n - `NETWORK`: \\[Optional\\] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster\n - `SUBNETWORK`: \\[Optional\\] set a subnetwork name if using XPK, this must be created before creating your cluster\n2. Use XPK to create a GKE Pathways cluster. This command can take several\n minutes to provision the capacity. Once completed, your capacity is\n allocated and you will start incurring charges.\n\n ```bash\n xpk cluster create-pathways \\\n --num-slices=${CLUSTER_NODEPOOL_COUNT} \\\n --tpu-type=${TPU_TYPE} \\\n --pathways-gce-machine-type=${PW_CPU_MACHINE_TYPE} \\\n --on-demand \\\n --project=${PROJECT} \\\n --zone=${ZONE} \\\n --cluster=${CLUSTER} \\\n --custom-cluster-arguments=\"--network=${NETWORK} --subnetwork=${SUBNETWORK} --enable-ip-alias\"\n ```\n\nOnce the cluster is created, you can create and delete workloads as needed. You\ndon't need to re-provision the TPU capacity.\n\n### gcloud\n\n1. Set some environment variables\n\n ```bash\n CLUSTER=GKE_CLUSTER_NAME\n PROJECT=PROJECT_ID\n ZONE=ZONE\n REGION=REGION\n CLUSTER_VERSION=GKE_CLUSTER_VERSION\n PW_CPU_MACHINE_TYPE=\"\u003cvar translate=\"no\"\u003en2-standard-64\u003c/var\u003e\"\n NETWORK=NETWORK\n SUBNETWORK=SUB_NETWORK\n CLUSTER_NODEPOOL_COUNT=3\n TPU_MACHINE_TYPE=\"\u003cvar translate=\"no\"\u003ect5lp-hightpu-4t\u003c/var\u003e\"\n WORKERS_PER_SLICE=2\n TOPOLOGY=\"\u003cvar translate=\"no\"\u003e2x4\u003c/var\u003e\"\n NUM_CPU_NODES=1\n ```\n\n Replace the following:\n - `CLUSTER`: the GKE cluster name\n - `PROJECT_ID`: your Google Cloud project name\n - `ZONE`: the zone where you are creating resources\n - `REGION`: the region where you are creating resources\n - `CLUSTER_VERSION`: \\[Optional\\] the GKE cluster version, use 1.32.2-gke.1475000 or later\n - `PW_CPU_MACHINE_TYPE`: the CPU node type for the Pathways controller\n - `NETWORK`: \\[Optional\\] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster\n - `SUBNETWORK`: \\[Optional\\] set a subnetwork name if using XPK, this must be created before creating your cluster\n - `CLUSTER_NODEPOOL_COUNT`: the maximum number of node pools a workload can use\n - `TPU_MACHINE_TYPE`: the [TPU machine type](/kubernetes-engine/docs/concepts/plan-tpus#choose-tpu-version) you want to use\n - `WORKERS_PER_SLICE`: the number of nodes per node pool\n\n - `GKE_ACCELERATOR_TYPE`: the Google Kubernetes Engine accelerator type, see [Choose a TPU version](/kubernetes-engine/docs/concepts/plan-tpus#choose-tpu-version)\n\n - `TOPOLOGY`: the TPU topology\n\n - `NUM_CPU_NODES`: the Pathways CPU node pool size\n\nThe following steps explain how to create a GKE cluster and\nset it up for running Pathways workloads.\n\n1. Create a GKE cluster:\n\n gcloud beta container clusters create ${CLUSTER} \\\n --project=${PROJECT} \\\n --zone=${ZONE} \\\n --cluster-version=${CLUSTER_VERSION} \\\n --scopes=storage-full,gke-default,cloud-platform \\\n --machine-type ${PW_CPU_MACHINE_TYPE} \\\n --network=${NETWORK} \\\n --subnetwork=${SUBNETWORK}\n\n2. Create TPU node pools:\n\n for i in $(seq 1 ${CLUSTER_NODEPOOL_COUNT}); do\n gcloud container node-pools create \"tpu-np-${i}\" \\\n --project=${PROJECT} \\\n --zone=${ZONE} \\\n --cluster=${CLUSTER} \\\n --machine-type=${TPU_MACHINE_TYPE} \\\n --num-nodes=${WORKERS_PER_SLICE} \\\n --placement-type=COMPACT \\\n --tpu-topology=${TOPOLOGY} \\\n --scopes=storage-full,gke-default,cloud-platform \\\n --workload-metadata=GCE_METADATA\n done\n\n3. Create a CPU node pool:\n\n gcloud container node-pools create \"cpu-pathways-np\" \\\n --project ${PROJECT} \\\n --zone ${ZONE} \\\n --cluster ${CLUSTER} \\\n --machine-type ${PW_CPU_MACHINE_TYPE} \\\n --num-nodes ${NUM_CPU_NODES} \\\n --scopes=storage-full,gke-default,cloud-platform \\\n --workload-metadata=GCE_METADATA\n\n4. Install the `JobSet` and `PathwaysJob` APIs\n\n Get credentials for the cluster and add them to your local kubectl context.\n **Note:** in the following command, if you are using zonal clusters, specify `--zone`, if you are using regional clusters, specify `--region`. \n\n gcloud container clusters get-credentials ${CLUSTER} \\\n [--zone=${ZONE} | --region=${REGION}] \\\n --project=${PROJECT} \\\n && kubectl config set-context --current --namespace=default\n\n To use the Pathways architecture on your GKE cluster, you need to install the\n `JobSet` API and the `PathwaysJob` API. \n\n kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml\n kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.2/install.yaml\n\nWhat's next\n-----------\n\n- [Run a batch workload with Pathways](/ai-hypercomputer/docs/workloads/pathways-on-cloud/batch-workload)\n- [Pathways interactive mode](/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-interactive-mode)\n- [Multihost inference with Pathways](/ai-hypercomputer/docs/workloads/pathways-on-cloud/multihost-inference)\n- [Resilient training with Pathways](/ai-hypercomputer/docs/workloads/pathways-on-cloud/resilient-training)\n- [Porting JAX workloads to Pathways](/ai-hypercomputer/docs/workloads/pathways-on-cloud/porting-jax-workloads)\n- [Troubleshoot Pathways](/ai-hypercomputer/docs/workloads/pathways-on-cloud/troubleshooting-pathways)"]]