[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis guide describes the benefits and limitations of using\nFlex-start VMs with Vertex AI inference. This guide also\ndescribes how to deploy a model that uses Flex-start VMs.\n\nOverview\n\nYou can reduce the cost of running your inference jobs by using\nFlex-start VMs, which are powered by\n[Dynamic Workload Scheduler](/blog/products/compute/introducing-dynamic-workload-scheduler).\nFlex-start VMs offer significant discounts and are well-suited for\nshort-duration workloads.\n\nYou can specify how long you need a Flex-start VM, for\nany duration up to seven days. After the requested time expires, your\ndeployed model is automatically undeployed. You can also manually undeploy\nthe model before the time expires.\n\nAutomatic undeployment\n\nIf you request a Flex-start VM for a specific duration,\nyour model is automatically undeployed after that time period. For example,\nif you request a Flex-start VM for five hours, the model\nis automatically undeployed five hours after submission. You are only charged\nfor the amount of time your workload is running.\n\nLimitations and requirements\n\nConsider the following limitations and requirements when you use\nFlex-start VMs:\n\n- **Maximum duration**: Flex-start VMs have a maximum usage duration of seven days. Any deployment request for a longer duration will be rejected.\n- **TPU support**: Using Flex-start VMs with TPU Pods isn't supported.\n- **Quota** : Make sure you have sufficient Vertex AI preemptible quota before launching your job. To learn more, see [Rate quotas](/vertex-ai/docs/quotas#serving).\n- **Queued provisioning**: Using Flex-start VMs with queued provisioning isn't supported.\n- **Node recycling**: Node recycling isn't supported.\n\nBilling\n\nIf your workload runs for less than seven days, using Flex-start VMs\ncan reduce your costs.\n\nWhen you use Flex-start VMs, you're billed based on the duration\nof your job and the machine type that you select. You are only charged for\nthe time your workload is actively running. You don't pay for the time that\nthe job is in a queue or for any time after the requested duration has expired.\n\nBilling is distributed across two SKUs:\n\n- The Compute Engine SKU, with the label `vertex-ai-online-prediction`. See\n [Dynamic Workload Scheduler pricing](https://cloud.google.com/products/dws/pricing).\n\n- The Vertex AI management fee SKU. See\n [Vertex AI pricing](/vertex-ai/pricing#prediction).\n\nGet inferences by using Flex-start VMs\n\nTo use Flex-start VMs when you deploy a model to get inferences,\nyou can use the REST API.\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003eLOCATION_ID\u003c/var\u003e: The region where you are using Vertex AI.\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e: The ID for the endpoint.\n- \u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e: The ID for the model to be deployed.\n- \u003cvar translate=\"no\"\u003eDEPLOYED_MODEL_NAME\u003c/var\u003e: A name for the `DeployedModel`. You can use the display name of the `Model` for the `DeployedModel` as well.\n- \u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e: Optional. The machine resources used for each node of this deployment. Its default setting is `n1-standard-2`. [Learn more about machine types.](/vertex-ai/docs/predictions/configure-compute)\n- \u003cvar translate=\"no\"\u003eACCELERATOR_TYPE\u003c/var\u003e: Optional. The type of accelerator to attach to the machine. [Learn more](/vertex-ai/docs/predictions/configure-compute#gpus).\n- \u003cvar translate=\"no\"\u003eACCELERATOR_COUNT\u003c/var\u003e: Optional. The number of accelerators for each replica to use.\n- \u003cvar translate=\"no\"\u003eMAX_RUNTIME_DURATION\u003c/var\u003e: The maximum duration for the flex-start deployment. The deployed model is automatically undeployed after this duration. Specify the duration in seconds, ending with an `s`. For example, `3600s` for one hour. The maximum value is `604800s` (7 days).\n- \u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e: Your project's automatically generated [project number](/resource-manager/docs/creating-managing-projects#identifiers).\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel\n```\n\n\nRequest JSON body:\n\n```\n{\n \"deployedModel\": {\n \"model\": \"projects/PROJECT/locations/LOCATION/models/MODEL_ID\",\n \"displayName\": \"DEPLOYED_MODEL_NAME\",\n \"enableContainerLogging\": true,\n \"dedicatedResources\": {\n \"machineSpec\": {\n \"machineType\": \"MACHINE_TYPE\",\n \"acceleratorType\": \"ACCELERATOR_TYPE\",\n \"acceleratorCount\": ACCELERATOR_COUNT\n },\n \"flexStart\": {\n \"maxRuntimeDuration\": \"MAX_RUNTIME_DURATION\"\n },\n \"minReplicaCount\": 2,\n \"maxReplicaCount\": 2\n },\n },\n}\n```\n\nTo send your request, expand one of these options:\n\ncurl (Linux, macOS, or Cloud Shell) **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel\"\n```\n\nPowerShell (Windows) **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel\" | Select-Object -Expand Content\n```\n\nYou should receive a JSON response similar to the following:\n\n```\n{\n \"name\": \"projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID\",\n \"metadata\": {\n \"@type\": \"type.googleapis.com/google.cloud.aiplatform.v1beta1.DeployModelOperationMetadata\",\n \"genericMetadata\": {\n \"createTime\": \"2020-10-19T17:53:16.502088Z\",\n \"updateTime\": \"2020-10-19T17:53:16.502088Z\"\n }\n }\n}\n```\n\nWhat's next\n\n- [Use Spot VMs with Vertex AI\n inference](/vertex-ai/docs/predictions/use-spot-vms).\n\n- [Use reservations with Vertex AI\n inference](/vertex-ai/docs/predictions/use-reservations)."]]