Google Distributed Cloud (GDC) 에어갭의 Vertex AI 광학 문자 인식 (OCR) 서비스는 BatchAnnotateImages API 메서드를 사용하여 이미지에서 텍스트를 감지합니다. 이 서비스는 이미지에 JPEG 및 PNG 파일을 지원합니다.
이 페이지에서는 Distributed Cloud에서 OCR API를 사용하여 이미지 텍스트를 감지하는 방법을 보여줍니다.
시작하기 전에
OCR API를 사용하려면 OCR API가 사용 설정된 프로젝트와 적절한 사용자 인증 정보가 있어야 합니다.
클라이언트 라이브러리를 설치하여 API를 호출할 수도 있습니다. 자세한 내용은 문자 인식 프로젝트 설정을 참고하세요.
JPEG 및 PNG 파일에서 텍스트 감지
BatchAnnotateImages 메서드는 JPEG 또는 PNG 파일 배치에서 텍스트를 감지합니다.
텍스트를 감지할 파일을 API 요청의 콘텐츠로 직접 전송합니다. 시스템은 API 응답에서 감지된 텍스트를 JSON 형식으로 반환합니다.
API 요청의 JSON 본문에 있는 필드의 값을 지정해야 합니다. 다음 표에는 텍스트 감지 요청에 BatchAnnotateImages API 메서드를 사용할 때 제공해야 하는 요청 본문 필드에 관한 설명이 포함되어 있습니다.
요청 본문 필드
필드 설명
content
감지할 텍스트가 포함된 이미지입니다. 바이너리 이미지 데이터의 Base64 표현 (ASCII 문자열)을 제공합니다.
type
이미지에서 필요한 텍스트 감지 유형입니다.
다음 두 가지 주석 기능 중 하나를 지정합니다.
TEXT_DETECTION은 임의의 이미지에서 텍스트를 감지하고 추출합니다. JSON 응답에는 추출된 문자열, 개별 단어, 해당 경계 상자가 포함됩니다.
DOCUMENT_TEXT_DETECTION도 이미지에서 텍스트를 추출하지만, 서비스는 밀집된 텍스트와 문서에 맞게 응답을 최적화합니다. JSON은 페이지, 블록, 단락, 단어, 줄바꿈 정보를 포함합니다.
fromgoogle.cloudimportvisionimportgoogle.authfromgoogle.auth.transportimportrequestsfromgoogle.api_core.client_optionsimportClientOptionsaudience="https://ENDPOINT:443"api_endpoint="ENDPOINT:443"defvision_client(creds):opts=ClientOptions(api_endpoint=api_endpoint)returnvision.ImageAnnotatorClient(credentials=creds,client_options=opts)defmain():creds=Nonetry:creds,project_id=google.auth.default()creds=creds.with_gdch_audience(audience)req=requests.Request()creds.refresh(req)print("Got token: ")print(creds.token)exceptExceptionase:print("Caught exception"+str(e))raiseereturncredsdefvision_func(creds):vc=vision_client(creds)image={"content":"BASE64_ENCODED_IMAGE"}features=[{"type_":vision.Feature.Type.FEATURE_TYPE}]# Each requests element corresponds to a single image. To annotate more# images, create a request element for each image and add it to# the array of requestsreq={"image":image,"features":features}metadata=[("x-goog-user-project","projects/PROJECT_ID")]resp=vc.annotate_image(req,metadata=metadata)print(resp)if__name__=="__main__":creds=main()vision_func(creds)
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eVertex AI's OCR service on Google Distributed Cloud (GDC) air-gapped uses the \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e API to detect text in JPEG and PNG images.\u003c/p\u003e\n"],["\u003cp\u003eTo use the OCR API, users must have a project with the API enabled and appropriate credentials, potentially including installing client libraries.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e method requires the image data as Base64 encoded content within the API request and allows specifying the type of detection (\u003ccode\u003eTEXT_DETECTION\u003c/code\u003e or \u003ccode\u003eDOCUMENT_TEXT_DETECTION\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003elanguage_hints\u003c/code\u003e field in the API request is optional and allows users to specify the language of the text for improved detection accuracy, following BCP 47 language tag formatting.\u003c/p\u003e\n"],["\u003cp\u003eText detection can be performed via a REST API method or a Python script, both requiring authentication and the specification of an endpoint, project ID, and the Base64-encoded image.\u003c/p\u003e\n"]]],[],null,["# Detect text in images\n\nThe Optical Character Recognition (OCR) service of Vertex AI on\nGoogle Distributed Cloud (GDC) air-gapped detects text in images using the\n`BatchAnnotateImages` API method. The service supports JPEG and PNG files for\nimages.\n\nThis page shows you how to detect image text using the OCR API on\nDistributed Cloud.\n\nBefore you begin\n----------------\n\nBefore you can start using the OCR API, you must have a project\nwith the OCR API enabled and have the appropriate credentials.\nYou can also install client libraries to help you make calls to the API. For\nmore information, see [Set up a character recognition project](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr).\n\nDetect text from JPEG and PNG files\n-----------------------------------\n\nThe `BatchAnnotateImages` method detects text from a batch of JPEG or PNG files.\nYou send the file from which you want to detect text directly as content in the\nAPI request. The system returns the resulting detected text in JSON format in\nthe API response.\n\nYou must specify values for the fields in the JSON body of your API request. The\nfollowing table contains a description of the request body fields you must\nprovide when you use the `BatchAnnotateImages` API method for your text\ndetection requests:\n\nFor information about the complete JSON representation, see\n[`AnnotateImageRequest`](/distributed-cloud/hosted/docs/latest/gdch/apis/vertex-ai/ocr/rest/v1/AnnotateImageRequest).\n\n### Make an API request\n\nMake a request to the OCR pre-trained API using the REST API\nmethod. Otherwise, interact with the OCR pre-trained API from a\nPython script to detect text from JPEG or PNG files.\n| **Note:** The `BatchAnnotateImages` API method only supports a single request per batch call.\n\nThe following examples show how to detect text in an image using\nOCR: \n\n### REST\n\nFollow these steps to detect text in images using the REST API method:\n\n1. Save the following `request.json` file for your request body:\n\n cat \u003c\u003c- EOF \u003e request.json\n {\n \"requests\": [\n {\n \"image\": {\n \"content\": \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-err\"\u003eBASE\u003c/span\u003e\u003cspan class=\"devsite-syntax-mi\"\u003e64\u003c/span\u003e\u003cspan class=\"devsite-syntax-err\"\u003e_ENCODED_IMAGE\u003c/span\u003e\u003c/var\u003e\n },\n \"features\": [\n {\n \"type\": \"\u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e\"\n }\n ],\n \"image_context\": {\n \"language_hints\": [\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_1\u003c/var\u003e\",\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_2\u003c/var\u003e\",\n ...\n ]\n }\n }\n ]\n }\n EOF\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003eLANGUAGE_HINT\u003c/var\u003e: the BCP 47 language tags to use as language hints for text detection, such as `en-t-i0-handwrit`. This field is optional and the system interprets an empty value as automatic language detection.\n2. [Get an authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n3. Make the request:\n\n ### curl\n\n curl -X POST \\\n -H \"Authorization: Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\" \\\n -H \"x-goog-user-project: projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n ### PowerShell\n\n $headers = @{\n \"Authorization\" = \"Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\"\n \"x-goog-user-project\" = \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"\n }\n\n Invoke-WebRequest\n -Method POST\n -Headers $headers\n -ContentType: \"application/json; charset=utf-8\"\n -InFile request.json\n -Uri \"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\" | Select-Object -Expand Content\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n### Python\n\nFollow these steps to use the OCR service from a Python\nscript to detect text in an image:\n\n1. [Install the latest version of the OCR client library](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-install-libraries).\n\n2. [Set the required environment variables on a Python script](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr#set-env-var).\n\n3. [Authenticate your API request](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n4. Add the following code to the Python script you created:\n\n from google.cloud import vision\n import google.auth\n from google.auth.transport import requests\n from google.api_core.client_options import ClientOptions\n\n audience = \"https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n api_endpoint=\"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n\n def vision_client(creds):\n opts = ClientOptions(api_endpoint=api_endpoint)\n return vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.services.image_annotator.ImageAnnotatorClient.html(credentials=creds, client_options=opts)\n\n def main():\n creds = None\n try:\n creds, project_id = google.auth.default()\n creds = creds.with_gdch_audience(audience)\n req = requests.Request()\n creds.refresh(req)\n print(\"Got token: \")\n print(creds.token)\n except Exception as e:\n print(\"Caught exception\" + str(e))\n raise e\n return creds\n\n def vision_func(creds):\n vc = vision_client(creds)\n image = {\"content\": \"\u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e\"}\n features = [{\"type_\": vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.types.Feature.html.Type.\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eFEATURE_TYPE\u003c/span\u003e\u003c/var\u003e}]\n # Each requests element corresponds to a single image. To annotate more\n # images, create a request element for each image and add it to\n # the array of requests\n req = {\"image\": image, \"features\": features}\n\n metadata = [(\"x-goog-user-project\", \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\")]\n\n resp = vc.annotate_image(req,metadata=metadata)\n\n print(resp)\n\n if __name__==\"__main__\":\n creds = main()\n vision_func(creds)\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n5. Save the Python script.\n\n6. Run the Python script to detect text in the image:\n\n python \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eSCRIPT_NAME\u003c/span\u003e\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eSCRIPT_NAME\u003c/var\u003e with the name you gave to your\n Python script, such as `vision.py`."]]