Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Layanan Pengenalan Karakter Optik (OCR) Vertex AI di Google Distributed Cloud (GDC) yang terisolasi mendeteksi teks dalam gambar menggunakan metode API BatchAnnotateImages. Layanan ini mendukung file JPEG dan PNG untuk gambar.
Halaman ini menunjukkan cara mendeteksi teks gambar menggunakan OCR API di Distributed Cloud.
Sebelum memulai
Sebelum dapat mulai menggunakan OCR API, Anda harus memiliki project yang mengaktifkan OCR API dan memiliki kredensial yang sesuai.
Anda juga dapat menginstal library klien untuk membantu Anda melakukan panggilan ke API. Untuk
informasi selengkapnya, lihat Menyiapkan project pengenalan karakter.
Mendeteksi teks dari file JPEG dan PNG
Metode BatchAnnotateImages mendeteksi teks dari batch file JPEG atau PNG.
Anda mengirimkan file yang ingin Anda deteksi teksnya secara langsung sebagai konten dalam permintaan API. Sistem menampilkan teks yang terdeteksi dalam format JSON dalam respons API.
Anda harus menentukan nilai untuk kolom dalam isi JSON permintaan API Anda. Tabel berikut berisi deskripsi kolom isi permintaan yang harus Anda berikan saat menggunakan metode API BatchAnnotateImages untuk permintaan deteksi teks:
Kolom isi permintaan
Deskripsi kolom
content
Gambar dengan teks yang akan dideteksi. Anda memberikan representasi Base64 (string ASCII) dari data gambar biner Anda.
type
Jenis deteksi teks yang Anda butuhkan dari gambar.
Tentukan salah satu dari dua fitur anotasi:
TEXT_DETECTION mendeteksi dan mengekstrak teks dari gambar apa pun. Respons JSON mencakup string yang diekstrak, setiap kata, dan kotak pembatasnya.
DOCUMENT_TEXT_DETECTION juga mengekstrak teks dari gambar, tetapi layanan ini mengoptimalkan respons untuk teks dan dokumen yang padat. JSON mencakup informasi halaman, blok, paragraf, kata, dan jeda.
Opsional. Daftar bahasa yang akan digunakan untuk deteksi teks.
Sistem menafsirkan nilai kosong untuk kolom ini sebagai deteksi bahasa otomatis.
Anda tidak perlu menetapkan kolom language_hints untuk bahasa yang didasarkan pada alfabet Latin.
Jika Anda mengetahui bahasa teks dalam gambar, menyetel petunjuk akan meningkatkan hasil.
Untuk mengetahui informasi tentang representasi JSON lengkap, lihat
AnnotateImageRequest.
Membuat permintaan API
Buat permintaan ke API terlatih OCR menggunakan metode REST API. Atau, berinteraksi dengan API yang telah dilatih sebelumnya untuk OCR dari skrip Python guna mendeteksi teks dari file JPEG atau PNG.
Contoh berikut menunjukkan cara mendeteksi teks dalam gambar menggunakan
OCR:
REST
Ikuti langkah-langkah berikut untuk mendeteksi teks dalam gambar menggunakan metode REST API:
Simpan file request.json berikut untuk isi permintaan Anda:
BASE64_ENCODED_IMAGE: representasi Base64
(string ASCII) dari data gambar biner Anda. String ini dimulai dengan
karakter yang terlihat mirip dengan
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: jenis deteksi teks yang Anda perlukan dari gambar. Nilai yang diizinkan adalah TEXT_DETECTION atau
DOCUMENT_TEXT_DETECTION.
LANGUAGE_HINT: tag bahasa BCP 47 yang akan digunakan sebagai
petunjuk bahasa untuk deteksi teks, seperti en-t-i0-handwrit. Kolom ini bersifat opsional dan sistem menafsirkan nilai kosong sebagai deteksi bahasa otomatis.
Tambahkan kode berikut ke skrip Python yang Anda buat:
fromgoogle.cloudimportvisionimportgoogle.authfromgoogle.auth.transportimportrequestsfromgoogle.api_core.client_optionsimportClientOptionsaudience="https://ENDPOINT:443"api_endpoint="ENDPOINT:443"defvision_client(creds):opts=ClientOptions(api_endpoint=api_endpoint)returnvision.ImageAnnotatorClient(credentials=creds,client_options=opts)defmain():creds=Nonetry:creds,project_id=google.auth.default()creds=creds.with_gdch_audience(audience)req=requests.Request()creds.refresh(req)print("Got token: ")print(creds.token)exceptExceptionase:print("Caught exception"+str(e))raiseereturncredsdefvision_func(creds):vc=vision_client(creds)image={"content":"BASE64_ENCODED_IMAGE"}features=[{"type_":vision.Feature.Type.FEATURE_TYPE}]# Each requests element corresponds to a single image. To annotate more# images, create a request element for each image and add it to# the array of requestsreq={"image":image,"features":features}metadata=[("x-goog-user-project","projects/PROJECT_ID")]resp=vc.annotate_image(req,metadata=metadata)print(resp)if__name__=="__main__":creds=main()vision_func(creds)
Ganti kode berikut:
ENDPOINT: endpoint OCR yang Anda gunakan untuk organisasi Anda. Untuk mengetahui informasi selengkapnya, lihat status dan endpoint layanan.
BASE64_ENCODED_IMAGE: representasi Base64
(string ASCII) dari data gambar biner Anda. String ini dimulai dengan
karakter yang terlihat mirip dengan
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: jenis deteksi teks yang Anda perlukan dari gambar. Nilai yang diizinkan adalah TEXT_DETECTION atau
DOCUMENT_TEXT_DETECTION.
PROJECT_ID: project ID Anda.
Simpan skrip Python.
Jalankan skrip Python untuk mendeteksi teks dalam gambar:
pythonSCRIPT_NAME
Ganti SCRIPT_NAME dengan nama yang Anda berikan ke skrip
Python, seperti vision.py.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eVertex AI's OCR service on Google Distributed Cloud (GDC) air-gapped uses the \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e API to detect text in JPEG and PNG images.\u003c/p\u003e\n"],["\u003cp\u003eTo use the OCR API, users must have a project with the API enabled and appropriate credentials, potentially including installing client libraries.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e method requires the image data as Base64 encoded content within the API request and allows specifying the type of detection (\u003ccode\u003eTEXT_DETECTION\u003c/code\u003e or \u003ccode\u003eDOCUMENT_TEXT_DETECTION\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003elanguage_hints\u003c/code\u003e field in the API request is optional and allows users to specify the language of the text for improved detection accuracy, following BCP 47 language tag formatting.\u003c/p\u003e\n"],["\u003cp\u003eText detection can be performed via a REST API method or a Python script, both requiring authentication and the specification of an endpoint, project ID, and the Base64-encoded image.\u003c/p\u003e\n"]]],[],null,["# Detect text in images\n\nThe Optical Character Recognition (OCR) service of Vertex AI on\nGoogle Distributed Cloud (GDC) air-gapped detects text in images using the\n`BatchAnnotateImages` API method. The service supports JPEG and PNG files for\nimages.\n\nThis page shows you how to detect image text using the OCR API on\nDistributed Cloud.\n\nBefore you begin\n----------------\n\nBefore you can start using the OCR API, you must have a project\nwith the OCR API enabled and have the appropriate credentials.\nYou can also install client libraries to help you make calls to the API. For\nmore information, see [Set up a character recognition project](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr).\n\nDetect text from JPEG and PNG files\n-----------------------------------\n\nThe `BatchAnnotateImages` method detects text from a batch of JPEG or PNG files.\nYou send the file from which you want to detect text directly as content in the\nAPI request. The system returns the resulting detected text in JSON format in\nthe API response.\n\nYou must specify values for the fields in the JSON body of your API request. The\nfollowing table contains a description of the request body fields you must\nprovide when you use the `BatchAnnotateImages` API method for your text\ndetection requests:\n\nFor information about the complete JSON representation, see\n[`AnnotateImageRequest`](/distributed-cloud/hosted/docs/latest/gdch/apis/vertex-ai/ocr/rest/v1/AnnotateImageRequest).\n\n### Make an API request\n\nMake a request to the OCR pre-trained API using the REST API\nmethod. Otherwise, interact with the OCR pre-trained API from a\nPython script to detect text from JPEG or PNG files.\n| **Note:** The `BatchAnnotateImages` API method only supports a single request per batch call.\n\nThe following examples show how to detect text in an image using\nOCR: \n\n### REST\n\nFollow these steps to detect text in images using the REST API method:\n\n1. Save the following `request.json` file for your request body:\n\n cat \u003c\u003c- EOF \u003e request.json\n {\n \"requests\": [\n {\n \"image\": {\n \"content\": \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-err\"\u003eBASE\u003c/span\u003e\u003cspan class=\"devsite-syntax-mi\"\u003e64\u003c/span\u003e\u003cspan class=\"devsite-syntax-err\"\u003e_ENCODED_IMAGE\u003c/span\u003e\u003c/var\u003e\n },\n \"features\": [\n {\n \"type\": \"\u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e\"\n }\n ],\n \"image_context\": {\n \"language_hints\": [\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_1\u003c/var\u003e\",\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_2\u003c/var\u003e\",\n ...\n ]\n }\n }\n ]\n }\n EOF\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003eLANGUAGE_HINT\u003c/var\u003e: the BCP 47 language tags to use as language hints for text detection, such as `en-t-i0-handwrit`. This field is optional and the system interprets an empty value as automatic language detection.\n2. [Get an authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n3. Make the request:\n\n ### curl\n\n curl -X POST \\\n -H \"Authorization: Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\" \\\n -H \"x-goog-user-project: projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n ### PowerShell\n\n $headers = @{\n \"Authorization\" = \"Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\"\n \"x-goog-user-project\" = \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"\n }\n\n Invoke-WebRequest\n -Method POST\n -Headers $headers\n -ContentType: \"application/json; charset=utf-8\"\n -InFile request.json\n -Uri \"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\" | Select-Object -Expand Content\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n### Python\n\nFollow these steps to use the OCR service from a Python\nscript to detect text in an image:\n\n1. [Install the latest version of the OCR client library](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-install-libraries).\n\n2. [Set the required environment variables on a Python script](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr#set-env-var).\n\n3. [Authenticate your API request](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n4. Add the following code to the Python script you created:\n\n from google.cloud import vision\n import google.auth\n from google.auth.transport import requests\n from google.api_core.client_options import ClientOptions\n\n audience = \"https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n api_endpoint=\"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n\n def vision_client(creds):\n opts = ClientOptions(api_endpoint=api_endpoint)\n return vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.services.image_annotator.ImageAnnotatorClient.html(credentials=creds, client_options=opts)\n\n def main():\n creds = None\n try:\n creds, project_id = google.auth.default()\n creds = creds.with_gdch_audience(audience)\n req = requests.Request()\n creds.refresh(req)\n print(\"Got token: \")\n print(creds.token)\n except Exception as e:\n print(\"Caught exception\" + str(e))\n raise e\n return creds\n\n def vision_func(creds):\n vc = vision_client(creds)\n image = {\"content\": \"\u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e\"}\n features = [{\"type_\": vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.types.Feature.html.Type.\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eFEATURE_TYPE\u003c/span\u003e\u003c/var\u003e}]\n # Each requests element corresponds to a single image. To annotate more\n # images, create a request element for each image and add it to\n # the array of requests\n req = {\"image\": image, \"features\": features}\n\n metadata = [(\"x-goog-user-project\", \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\")]\n\n resp = vc.annotate_image(req,metadata=metadata)\n\n print(resp)\n\n if __name__==\"__main__\":\n creds = main()\n vision_func(creds)\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n5. Save the Python script.\n\n6. Run the Python script to detect text in the image:\n\n python \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eSCRIPT_NAME\u003c/span\u003e\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eSCRIPT_NAME\u003c/var\u003e with the name you gave to your\n Python script, such as `vision.py`."]]