[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-01 (世界標準時間)。"],[],[],null,["# Request TPUs with future reservation in calendar mode\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\n|\n| **Preview\n| --- Future reservation in calendar mode**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis guide shows you how to optimize Tensor Processing Unit (TPU) provisioning\nby using future reservation in calendar mode. Future reservation in calendar mode is a built-in calendar\nadvisor and recommender that can help\nyou locate TPU capacity and plan ahead. You can request capacity for a\nspecified start time and duration, between 1 and 90 days, and the recommender\nwill provide suggested dates.\n\nThis guide is intended for Machine learning (ML) engineers,\nPlatform admins and operators, and for Data and AI specialists who are interested\nin using Kubernetes container orchestration capabilities for running batch\nworkloads. For more information about common roles and example tasks that we\nreference in Google Cloud content, see\n[Common GKE user roles and tasks](/anthos/docs/concepts/roles-tasks).\n\nFor more information, see [About future reservation in calendar mode](/compute/docs/instances/future-reservations-calendar-mode-overview).\n\nUse cases\n---------\n\nFuture reservation in calendar mode works best for\nworkloads with scheduled, short-term, high-demand requests, like training, or\nbatch inference models that require high availability at the requested start\ntime.\n\nIf your workload requires dynamically provisioned resources as needed, for up to 7\ndays without long-term reservations or complex quota management, consider using\n*flex-start* . For more information, see\n[About GPU and TPU provisioning with flex-start](/kubernetes-engine/docs/concepts/dws).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- Ensure that you have either of the following:\n - an existing [Standard cluster](/kubernetes-engine/docs/how-to/creating-a-regional-cluster) that's running version 1.28.3-gke.1098000 or later.\n - an existing [Autopilot cluster](/kubernetes-engine/docs/concepts/autopilot-overview) that's running version 1.30.3-gke.1451000 or later.\n\nRequest future reservation in calendar mode for TPUs\n----------------------------------------------------\n\nThe process to request TPUs with future reservation in calendar mode involves the following steps:\n\n1. Ensure that you have sufficient quota for any resources that aren't part of a reservation when VMs are created, such as disks or IP addresses. Future reservation requests in calendar mode don't require Compute Engine quota.\n2. Complete the steps in [create a request in calendar mode](/compute/docs/instances/future-reservations-calendar-mode-overview#create-request). These steps include the following:\n 1. View TPU future availability.\n 2. Create and submit a future reservation request in calendar mode for TPUs.\n 3. Wait for Google Cloud to approve your request.\n3. Create a TPU node pool that uses your reservation.\n\nCreate a node pool\n------------------\n\nThis section applies to Standard clusters only.\n\nYou can use your reservation when you create single-host or multi-host TPU slice node pools. For example, you can create a [single-host TPU slice node pool](/kubernetes-engine/docs/concepts/tpus#single-host)\nusing the Google Cloud CLI. \n\n gcloud container node-pools create \u003cvar translate=\"no\"\u003eNODE_POOL_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --cluster=\u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --node-locations=\u003cvar translate=\"no\"\u003eNODE_ZONES\u003c/var\u003e \\\n --machine-type=\u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e \\\n --reservation-affinity=specific \\ This is required\n --reservation=\u003cvar translate=\"no\"\u003eRESERVATION\u003c/var\u003e\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eNODE_POOL_NAME\u003c/var\u003e: the name of the new node pool.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the name of the zone based on the TPU version you want to use. To identify an available location, see [TPU availability in GKE](/kubernetes-engine/docs/concepts/plan-tpus#availability).\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the cluster.\n- \u003cvar translate=\"no\"\u003eNODE_ZONES\u003c/var\u003e: the comma-separated list of one or more zones where GKE creates the node pool.\n- \u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e: the type of machine to use for nodes. For more information about TPU compatible machine types, use the table in [Choose the TPU version](/kubernetes-engine/docs/concepts/plan-tpus#choose-tpu-version).\n- \u003cvar translate=\"no\"\u003eRESERVATION\u003c/var\u003e: the name of the calendar reservation to consume.\n\nFor a full list of all the flags that you can specify, see the\n[`gcloud container clusters create`](/sdk/gcloud/reference/container/clusters/create)\nreference.\n\nAfter you create a node pool with the calendar reservation, you can deploy your\nworkload like any other TPU node pool. For example,\nyou can create a Job that specifies the TPU node pool that consumes the reserved TPUs.\n\nWhat's next\n-----------\n\n- Try GKE deployment examples for generative AI models that use the TPU resources that you reserved:\n\n - [Serve an LLM using TPU Trillium on GKE with vLLM](/kubernetes-engine/docs/tutorials/serve-vllm-tpu)\n - [Serve an LLM using TPUs on GKE with KubeRay](/kubernetes-engine/docs/tutorials/serve-lllm-tpu-ray)\n - [Serve an LLM using TPUs on GKE with JetStream and PyTorch](/kubernetes-engine/docs/tutorials/serve-llm-tpu-jetstream-pytorch)\n - [Serve Gemma using TPUs on GKE with JetStream](/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)\n - [Serve Stable Diffusion XL (SDXL) using TPUs on GKE with MaxDiffusion](/kubernetes-engine/docs/tutorials/serve-sdxl-tpu)\n - [Serve open source models using TPUs on GKE with Optimum TPU](/kubernetes-engine/docs/tutorials/optimum-tpu)\n- Explore experimental samples for leveraging GKE to accelerate your AI/ML initiatives in [GKE AI Labs](https://gke-ai-labs.dev/)."]]