응답을 일괄로 가져오면 지연 시간에 민감하지 않은 다수의 임베딩 요청을 효율적으로 전송할 수 있습니다. 한 번에 입력 요청 하나로 제한되는 온라인 응답 가져오기와 달리 단일 일괄 요청에서 다수의 LLM 요청을 전송할 수 있습니다. Vertex AI의 테이블 형식 데이터에 대해 일괄 예측을 수행하는 방법과 마찬가지로 출력 위치를 결정하고 입력을 추가하면 응답이 출력 위치에 비동기식으로 채워집니다.
일괄 예측을 지원하는 텍스트 임베딩 모델
텍스트 임베딩 모델의 모든 안정화 버전은 일괄 예측을 지원합니다. 안정화 버전은 더 이상 프리뷰 버전이 아니며 프로덕션 환경에서 완전히 지원됩니다. 지원되는 임베딩 모델의 전체 목록은 임베딩 모델 및 버전을 참조하세요.
입력 준비
일괄 요청 입력은 BigQuery 테이블에 저장하거나 Cloud Storage의 JSON Lines(JSONL) 파일로 저장할 수 있는 프롬프트 목록입니다. 요청마다 프롬프트가 최대 30,000개까지 포함될 수 있습니다.
JSONL 예시
이 섹션에서는 JSONL 입력 및 출력을 포맷하는 방법의 예시를 보여줍니다.
JSONL 입력 예시
{"content":"Give a short description of a machine learning model:"}{"content":"Best recipe for banana bread:"}
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-07-16(UTC)"],[],[],null,["# Get batch text embeddings predictions\n\nGetting responses in a batch is a way to efficiently send large numbers of non-latency\nsensitive embeddings requests. Different from getting online responses,\nwhere you are limited to one input request at a time, you can send a large number\nof LLM requests in a single batch request. Similar to how batch prediction is done\nfor [tabular data in Vertex AI](/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions),\nyou determine your output location, add your input, and your responses asynchronously\npopulate into your output location.\n\nText embeddings models that support batch predictions\n-----------------------------------------------------\n\nAll stable versions of text embedding models support batch predictions. Stable\nversions are versions which are no longer in preview and are fully supported for\nproduction environments. To see the full list of supported embedding models, see\n[Embedding model and versions](/vertex-ai/generative-ai/docs/learn/model-versioning#embedding_models_and_versions).\n\nPrepare your inputs\n-------------------\n\nThe input for batch requests are a list of prompts that can either be stored in\na BigQuery table or as a\n[JSON Lines (JSONL)](https://jsonlines.org/) file in\nCloud Storage. Each request can include up to 30,000 prompts.\n\n### JSONL example\n\nThis section shows examples of how to format JSONL input and output.\n\n#### JSONL input example\n\n {\"content\":\"Give a short description of a machine learning model:\"}\n {\"content\":\"Best recipe for banana bread:\"}\n\n#### JSONL output example\n\n {\"instance\":{\"content\":\"Give...\"},\"predictions\": [{\"embeddings\":{\"statistics\":{\"token_count\":8,\"truncated\":false},\"values\":[0.2,....]}}],\"status\":\"\"}\n {\"instance\":{\"content\":\"Best...\"},\"predictions\": [{\"embeddings\":{\"statistics\":{\"token_count\":3,\"truncated\":false},\"values\":[0.1,....]}}],\"status\":\"\"}\n\n### BigQuery example\n\nThis section shows examples of how to format BigQuery input and output.\n\n#### BigQuery input example\n\nThis example shows a single column BigQuery table.\n\n#### BigQuery output example\n\nRequest a batch response\n------------------------\n\nDepending on the number of input items that you've submitted, a\nbatch generation task can take some time to complete. \n\n### REST\n\nTo test a text prompt by using the Vertex AI API, send a POST request to the\npublisher model endpoint.\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: The ID of your Google Cloud project.\n- \u003cvar translate=\"no\"\u003eBP_JOB_NAME\u003c/var\u003e: The job name.\n- \u003cvar translate=\"no\"\u003eINPUT_URI\u003c/var\u003e: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.\n- \u003cvar translate=\"no\"\u003eOUTPUT_URI\u003c/var\u003e: Output target URI.\n\n\nHTTP method and URL:\n\n```\nPOST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs\n```\n\n\nRequest JSON body:\n\n```\n{\n \"name\": \"BP_JOB_NAME\",\n \"displayName\": \"BP_JOB_NAME\",\n \"model\": \"publishers/google/models/textembedding-gecko\",\n \"inputConfig\": {\n \"instancesFormat\":\"bigquery\",\n \"bigquerySource\":{\n \"inputUri\" : \"INPUT_URI\"\n }\n },\n \"outputConfig\": {\n \"predictionsFormat\":\"bigquery\",\n \"bigqueryDestination\":{\n \"outputUri\": \"OUTPUT_URI\"\n }\n }\n}\n\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs\" | Select-Object -Expand Content\n```\n\nYou should receive a JSON response similar to the following:\n\n```\n{\n \"name\": \"projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789\",\n \"displayName\": \"BP_sample_publisher_BQ_20230712_134650\",\n \"model\": \"projects/{PROJECT_ID}/locations/us-central1/models/textembedding-gecko\",\n \"inputConfig\": {\n \"instancesFormat\": \"bigquery\",\n \"bigquerySource\": {\n \"inputUri\": \"bq://project_name.dataset_name.text_input\"\n }\n },\n \"modelParameters\": {},\n \"outputConfig\": {\n \"predictionsFormat\": \"bigquery\",\n \"bigqueryDestination\": {\n \"outputUri\": \"bq://project_name.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650\"\n }\n },\n \"state\": \"JOB_STATE_PENDING\",\n \"createTime\": \"2023-07-12T20:46:52.148717Z\",\n \"updateTime\": \"2023-07-12T20:46:52.148717Z\",\n \"labels\": {\n \"owner\": \"sample_owner\",\n \"product\": \"llm\"\n },\n \"modelVersionId\": \"1\",\n \"modelMonitoringStatus\": {}\n}\n```\n\nThe response includes a unique identifier for the batch job.\nYou can poll for the status of the batch job using\nthe \u003cvar translate=\"no\"\u003eBATCH_JOB_ID\u003c/var\u003e until the job `state` is\n`JOB_STATE_SUCCEEDED`. For example: \n\n```bash\ncurl \\\n -X GET \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json\" \\\nhttps://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID\n```\n| **Note:** You can run only one batch response job at a time. Custom Service accounts, live progress, CMEK, and VPC-SC reports aren't supported at this time.\n\n### Python\n\n#### Install\n\n```\npip install --upgrade google-genai\n```\n\n\nTo learn more, see the\n[SDK reference documentation](https://googleapis.github.io/python-genai/).\n\n\nSet environment variables to use the Gen AI SDK with Vertex AI:\n\n```bash\n# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values\n# with appropriate values for your project.\nexport GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT\nexport GOOGLE_CLOUD_LOCATION=us-central1\nexport GOOGLE_GENAI_USE_VERTEXAI=True\n```\n\n\u003cbr /\u003e\n\n import time\n\n from google import genai\n from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions\n\n client = genai.Client(http_options=HttpOptions(api_version=\"v1\"))\n # TODO(developer): Update and un-comment below line\n # output_uri = \"gs://your-bucket/your-prefix\"\n\n # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create\n job = client.batches.create(\n model=\"text-embedding-005\",\n # Source link: https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl\n src=\"gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl\",\n config=CreateBatchJobConfig(dest=output_uri),\n )\n print(f\"Job name: {job.name}\")\n print(f\"Job state: {job.state}\")\n # Example response:\n # Job name: projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000\n # Job state: JOB_STATE_PENDING\n\n # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJob\n completed_states = {\n JobState.JOB_STATE_SUCCEEDED,\n JobState.JOB_STATE_FAILED,\n JobState.JOB_STATE_CANCELLED,\n JobState.JOB_STATE_PAUSED,\n }\n\n while job.state not in completed_states:\n time.sleep(30)\n job = client.batches.get(name=job.name)\n print(f\"Job state: {job.state}\")\n if job.state == JobState.JOB_STATE_FAILED:\n print(f\"Error: {job.error}\")\n break\n\n # Example response:\n # Job state: JOB_STATE_PENDING\n # Job state: JOB_STATE_RUNNING\n # Job state: JOB_STATE_RUNNING\n # ...\n # Job state: JOB_STATE_SUCCEEDED\n\n\u003cbr /\u003e\n\nRetrieve batch output\n---------------------\n\nWhen a batch prediction task is complete, the output is stored\nin the Cloud Storage bucket or BigQuery table that you specified\nin your request.\n\nWhat's next\n-----------\n\n- Learn how to [get text embeddings](/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)."]]