本頁面由 Cloud Translation API 翻譯而成。

從自行部署的 Model Garden 模型取得批次預測結果

Model Garden 中的部分模型可自行部署至專案 Google Cloud ，並用於提供批次預測。批次預測可讓您有效率地使用模型，處理多個不要求低延遲的純文字提示。

準備輸入內容

開始前，請先在 BigQuery 資料表中準備輸入內容，或以 Cloud Storage 中的 JSONL 檔案形式準備。這兩種來源的輸入內容都必須採用 OpenAI API 結構定義 JSON 格式，如下列範例所示：

{"body": {"messages": [{"role": "user", "content": "Give me a recipe for banana bread"}], "max_tokens": 1000}}

BigQuery

BigQuery 輸入資料表必須符合下列結構定義：

資料欄名稱	說明
custom_id	每個要求的 ID，用於將輸入內容與輸出內容配對。
方法	要求方法。
網址	要求端點。
body(JSON)	輸入提示。

輸入資料表可以有其他資料欄，但批次作業會忽略這些資料欄，並直接傳遞至輸出資料表。
批次預測工作會保留兩個資料欄名稱，用於批次預測輸出內容：response(JSON) 和 id。請勿在輸入表格中使用這些資料欄。
系統會捨棄 method 和 url 資料欄，不會納入輸出資料表。

Cloud Storage

如果是 Cloud Storage，輸入檔案必須是位於 Cloud Storage bucket 中的 JSONL 檔案。

取得模型所需的資源

選擇模型並查詢其資源需求。回應中的 dedicatedResources 欄位會顯示必要資源，這些資源是在批次預測工作的設定中指定。

REST

使用任何要求資料之前，請先替換以下項目：

PUBLISHER：模型發布者，例如 meta、google、mistral-ai 或 deepseek-ai。
PUBLISHER_MODEL_ID：發布商的模型 ID，例如 llama3_1。
VERSION_ID：模型的發布者版本 ID，例如 llama-3.1-8b-instruct。

HTTP 方法和網址：

GET "https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     ""https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri ""https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'" | Select-Object -Expand Content

您應該會收到執行成功的狀態碼 (2xx) 和空白回應。

要求批次預測

使用 BigQuery 或 Cloud Storage 的輸入內容，對自行部署的 Model Garden 模型進行批次預測。您可以選擇將預測結果輸出至 BigQuery 資料表，或 Cloud Storage 儲存空間中的 JSONL 檔案。

BigQuery

指定 BigQuery 輸入資料表、模型和輸出位置。批次預測工作和資料表必須位於相同區域。

REST

使用任何要求資料之前，請先替換以下項目：

LOCATION：支援 Model Garden 自行部署模型的區域。
PROJECT_ID：您的專案 ID。
MODEL：要微調的模型名稱，例如 llama-3.1-8b-instruct。
PUBLISHER：模型發布者，例如 meta、google、mistral-ai 或 deepseek-ai。
INPUT_URI：批次預測輸入內容所在的 BigQuery 資料表，例如 myproject.mydataset.input_table。
OUTPUT_FORMAT：如要輸出至 BigQuery 資料表，請指定 bigquery。如要輸出至 Cloud Storage 值區，請指定 jsonl。
DESTINATION：如果是 BigQuery，請指定 bigqueryDestination。如果是 Cloud Storage，請指定 gcsDestination。
OUTPUT_URI_FIELD_NAME：如果是 BigQuery，請指定 outputUri。如果是 Cloud Storage，請指定 outputUriPrefix。
OUTPUT_URI：如果是 BigQuery，請指定資料表位置，例如 myproject.mydataset.output_result。如果是 Cloud Storage，請指定值區和資料夾位置，例如 gs://mybucket/path/to/outputfile。
MACHINE_TYPE：定義要為模型部署的資源集，例如 g2-standard-4。
ACC_TYPE：指定要新增至批次預測作業的加速器，以利處理密集型工作負載時提升效能，例如 NVIDIA_L4。
ACC_COUNT：要在批次預測作業中使用的加速器數量。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

JSON 要求主體：

'{
  "displayName": "JOB_NAME",
  "model": "publishers/PUBLISHER/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "dedicated_resources": {
    "machine_spec": {
      "machine_type": "MACHINE_TYPE",
      "accelerator_type": "ACC_TYPE",
      "accelerator_count": ACC_COUNT,
    },
    "starting_replica_count": 1,
  },
}'

如要傳送要求，請選擇以下其中一個選項：

curl

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

您應該會收到類似如下的 JSON 回應。

回應

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/PUBLISHER/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "labels": {
    "purpose": "testing"
  },
  "modelVersionId": "1"
}

Cloud Storage

指定 JSONL 檔案的 Cloud Storage 位置、模型和輸出位置。

REST

使用任何要求資料之前，請先替換以下項目：

LOCATION：支援 Model Garden 自行部署模型的區域。
PROJECT_ID：您的專案 ID。
MODEL：要微調的模型名稱，例如 llama-3.1-8b-instruct。
PUBLISHER：模型發布者，例如 meta、google、mistral-ai 或 deepseek-ai。
INPUT_URI：JSONL 批次預測輸入內容的 Cloud Storage 位置，例如 gs://bucketname/path/to/jsonl。
OUTPUT_FORMAT：如要輸出至 BigQuery 資料表，請指定 bigquery。如要輸出至 Cloud Storage 值區，請指定 jsonl。
DESTINATION：如果是 BigQuery，請指定 bigqueryDestination。如果是 Cloud Storage，請指定 gcsDestination。
OUTPUT_URI_FIELD_NAME：如果是 BigQuery，請指定 outputUri。如果是 Cloud Storage，請指定 outputUriPrefix。
OUTPUT_URI：如果是 BigQuery，請指定資料表位置，例如 myproject.mydataset.output_result。如果是 Cloud Storage，請指定值區和資料夾位置，例如 gs://mybucket/path/to/outputfile。
MACHINE_TYPE：定義要為模型部署的資源集，例如 g2-standard-4。
ACC_TYPE：指定要新增至批次預測作業的加速器，以利處理密集型工作負載時提升效能，例如 NVIDIA_L4。
ACC_COUNT：要在批次預測作業中使用的加速器數量。

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

JSON 要求主體：

'{
  "displayName": "JOB_NAME",
  "model": "publishers/PUBLISHER/models/MODEL",
  "inputConfig": {
    "instancesFormat":"jsonl",
    "gcsDestination":{
      "uris" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "dedicated_resources": {
    "machine_spec": {
        "machine_type": "MACHINE_TYPE",
        "accelerator_type": "ACC_TYPE",
        "accelerator_count": ACC_COUNT,
    },
    "starting_replica_count": 1,
  },
}'

如要傳送要求，請選擇以下其中一個選項：

curl

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

您應該會收到類似如下的 JSON 回應。

回應

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/PUBLISHER/models/MODEL",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris": [
        "INPUT_URI"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "labels": {
    "purpose": "testing"
  },
  "modelVersionId": "1"
}

取得批次預測工作的狀態

取得批次預測工作的狀態，確認工作是否已順利完成。工作時間長度取決於您提交的輸入項目數量。

REST

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的專案 ID。
LOCATION：批次工作所在的區域。
JOB_ID：建立工作時傳回的批次工作 ID。

HTTP 方法和網址：

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID

如要傳送要求，請選擇以下其中一個選項：

curl

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content

您應該會收到類似如下的 JSON 回應。

回應

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/PUBLISHER/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_SUCCEEDED",
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "labels": {
    "purpose": "testing"
  },
  "modelVersionId": "1"
}

擷取輸出內容

批次預測工作完成後，請從您指定的位置擷取輸出內容：

如果是 BigQuery，輸出內容會顯示在目的地 BigQuery 資料表的「response(JSON)」欄中。
如果是 Cloud Storage，輸出內容會儲存為 JSONL 檔案，並存放在輸出 Cloud Storage 位置。

支援的模型

Vertex AI 支援下列自行部署模型的批次預測：

Llama
- publishers/meta/models/llama3_1@llama-3.1-8b-instruct
- publishers/meta/models/llama3_1@llama-3.1-70b-instruct
- publishers/meta/models/llama3_1@llama-3.1-405b-instruct-fp8
- publishers/meta/models/llama3-2@llama-3.2-1b-instruct
- publishers/meta/models/llama3-2@llama-3.2-3b-instruct
- publishers/meta/models/llama3-2@llama-3.2-90b-vision-instruct
Gemma
- publishers/google/models/gemma@gemma-1.1-2b-it
- publishers/google/models/gemma@gemma-7b-it
- publishers/google/models/gemma@gemma-1.1-7b-it
- publishers/google/models/gemma@gemma-2b-it
- publishers/google/models/gemma2@gemma-2-2b-it
- publishers/google/models/gemma2@gemma-2-9b-it
- publishers/google/models/gemma2@gemma-2-27b-it
Mistral
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.2
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.3
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.1
- publishers/mistral-ai/models/mistral@mistral-nemo-instruct-2407
Deepseek
- publishers/deepseek-ai/models/deepseek-r1@deepseek-r1-distill-llama-8b

從自行部署的 Model Garden 模型取得批次預測結果 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

準備輸入內容

BigQuery

Cloud Storage

取得模型所需的資源

REST

curl

PowerShell

要求批次預測

BigQuery

REST

curl

PowerShell

回應

Cloud Storage

REST

curl

PowerShell

回應

取得批次預測工作的狀態

REST

curl

PowerShell

回應

擷取輸出內容

支援的模型

從自行部署的 Model Garden 模型取得批次預測結果