일괄 텍스트 생성

일괄 예측은 지연 시간에 민감하지 않은 여러 개의 멀티모달 프롬프트를 효율적으로 전송하는 방법입니다. 한 번에 입력 프롬프트 하나로 제한되는 온라인 예측과 달리 단일 일괄 요청에서 다수의 멀티모달 프롬프트를 전송할 수 있습니다. 그러면 응답이 BigQuery 저장소 출력 위치에 비동기식으로 채워집니다.

Gemini 모델의 일괄 요청은 표준 요청보다 50% 할인됩니다. 자세한 내용은 가격 책정 페이지를 참조하세요.

일괄 예측을 지원하는 멀티모달 모델

다음 멀티모달 모델은 일괄 예측을 지원합니다.

gemini-1.5-flash-002
gemini-1.5-flash-001
gemini-1.5-pro-002
gemini-1.5-pro-001
gemini-1.0-pro-002
gemini-1.0-pro-001

입력 준비

멀티모달 모델에 대한 일괄 요청은 BigQuery 스토리지 소스와 Cloud Storage 소스를 허용합니다.

BigQuery 스토리지 입력

request 열의 콘텐츠는 유효한 JSON이어야 합니다. 이 JSON 데이터는 모델의 입력을 나타냅니다.
JSON 안내의 콘텐츠는 GenerateContentRequest의 구조와 일치해야 합니다.
입력 테이블에는 request 이외의 열이 있을 수 있습니다. 콘텐츠 생성 시 무시되지만 출력 테이블에는 포함됩니다. 시스템은 출력용으로 두 개의 열 이름(response 및 status)을 예약합니다. 이는 일괄 예측 작업의 결과에 관한 정보를 제공하는 데 사용됩니다.
일괄 예측은 Gemini의 fileData 필드를 지원하지 않습니다.

입력 예시(JSON)
`{ "contents": [ { "role": "user", "parts": { "text": "Give me a recipe for banana bread." } } ], "system_instruction": { "parts": [ { "text": "You are a chef." } ] } }`

Cloud Storage 입력

파일 형식: JSON Lines(JSONL)
위치: us-central1
서비스 계정에 적절한 읽기 권한

특정 Gemini 모델의 fileData 제한사항

입력 예시(JSONL)

입력 예시(JSONL)
{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}} {"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}


{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}}
{"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}

일괄 응답 요청

제출한 입력 항목 수에 따라 일괄 생성 태스크가 완료되는 데 다소 시간이 걸릴 수 있습니다.

REST

Vertex AI API를 사용하여 멀티모달 프롬프트를 테스트하려면 POST 요청을 게시자 모델 엔드포인트로 전송합니다.

요청 데이터를 사용하기 전에 다음을 바꿉니다.

PROJECT_ID: Google Cloud 프로젝트 이름입니다.
BP_JOB_NAME: 작업에 선택하는 이름입니다.
INPUT_URI: 입력 소스 URI. bq://PROJECT_ID.DATASET.TABLE 형식의 BigQuery 테이블 URI입니다. 또는 Cloud Storage 버킷 URI입니다.
INPUT_SOURCE: 입력 소스 유형입니다. 옵션은 bigquerySource 및 gcsSource입니다.
INSTANCES_FORMAT: 입력 인스턴스 형식입니다(`jsonl` 또는 `bigquery`).
OUTPUT_URI: 출력 또는 대상 출력 테이블의 URI로, bq://PROJECT_ID.DATASET.TABLE 형식입니다. 테이블이 아직 존재하지 않으면 자동으로 생성됩니다.

HTTP 메서드 및 URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

JSON 요청 본문:

{
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-002",
    "inputConfig": {
      "instancesFormat":"INSTANCES_FORMAT",
      "inputSource":{ INPUT_SOURCE
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "My first batch prediction",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "modelVersionId": "1"
}

응답에는 일괄 작업의 고유 식별자가 포함됩니다. 작업 state가 JOB_STATE_SUCCEEDED가 될 때까지 BATCH_JOB_ID를 사용하여 일괄 작업의 상태를 폴링할 수 있습니다. 예를 들면 다음과 같습니다.

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

일괄 출력 검색

일괄 예측 태스크가 완료되면 출력은 요청에 지정한 BigQuery 테이블에 저장됩니다.

BigQuery 출력 예시

요청	응답	상태
'{"content":[{...}]}'	{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } }

요청

응답

상태

'{"content":[{...}]}'

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "In a medium bowl, whisk together the flour, baking soda, baking powder."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14270912
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 8,
    "candidatesTokenCount": 396,
    "totalTokenCount": 404
  }
}

Cloud Storage 출력 예시

PROJECT_ID=[PROJECT ID]
REGION="us-central1"
MODEL_URI="publishers/google/models/gemini-1.0-pro-001@default"
INPUT_URI="[GCS INPUT URI]"
OUTPUT_URI="[OUTPUT URI]"

# Setting variables based on parameters
ENDPOINT="${REGION}-autopush-aiplatform.sandbox.googleapis.com"
API_VERSION=v1
ENV=autopush
BP_JOB_NAME="BP_testing_`date +%Y%m%d_%H%M%S`"

curl \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${ENDPOINT}/${API_VERSION}/projects/${PROJECT_ID}/locations/${REGION}/batchPredictionJobs \
-d '{
    "name": "'${BP_JOB_NAME}'",
    "displayName": "'${BP_JOB_NAME}'",
    "model": "'${MODEL_URI}'",
    "inputConfig": {
      "instancesFormat":"jsonl",
      "gcsSource":{
        "uris" : "'${INPUT_URI}'"
      }
    },
    "outputConfig": {
      "predictionsFormat":"jsonl",
      "gcsDestination":{
        "outputUriPrefix": "'${OUTPUT_URI}'"
      }
    },
    "labels": {"stage": "'${ENV}'"},
}'

다음 단계

Gemini 모델 조정 개요에서 Gemini 모델을 조정하는 방법을 알아보세요.
일괄 예측 API에 대해 자세히 알아보세요.