Stay organized with collections
Save and categorize content based on your preferences.
This page shows you how to detect text in files using the
Optical Character Recognition (OCR) API on Google Distributed Cloud (GDC) air-gapped appliance.
The OCR service of Vertex AI on
GDC air-gapped appliance detects text in PDF and TIFF files using the
BatchAnnotateFiles API method.
Before you begin
Before you can start using the OCR API, you must have a project
with the OCR API enabled and have the appropriate credentials.
You can also install client libraries to help you make calls to the API. For
more information, see Set up a character recognition project.
Detect text with inline requests
The BatchAnnotateFiles method detects text from a batch of PDF or TIFF files.
You send the file from which you want to detect text directly as content in the
API request. The system returns the resulting detected text in JSON format in
the API response.
You must specify values for the fields in the JSON body of your API request. The
following table contains a description of the request body fields you must
provide when you use the BatchAnnotateFiles API method for your text
detection requests:
Request body fields
Field description
content
The files with text to detect. You provide the Base64 representation (ASCII string) of your binary file content.
mime_type
The source file type. You must set it to one of the following values:
application/pdf for PDF files
image/tiff for TIFF files
type
The type of text detection you need from the file.
Specify one of the two annotation features:
TEXT_DETECTION detects and extracts text from any file. The JSON response includes the extracted string, individual words, and their bounding boxes.
DOCUMENT_TEXT_DETECTION also extracts text from a file, but the service optimizes the response for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
Optional. List of languages to use for the text detection.
The system interprets an empty value for this field as automatic language detection.
You don't need to set the language_hints field for languages based on the Latin alphabet.
If you know the language of the text in the file, setting a hint improves results.
pages
Optional. The number of pages from the file to process for text detection.
The maximum number of pages that you can specify is five. If you don't specify the number of pages, the service processes the first five pages of the file.
Make an inline API request
Make a request to the OCR pre-trained API using the REST API
method. Otherwise, interact with the OCR pre-trained API from a
Python script to detect text from PDF or TIFF files.
The following examples show how to detect text in a file using
OCR:
REST
Follow these steps to detect text in files using the REST API method:
Save the following request.json file for your request body:
BASE64_ENCODED_FILE: the Base64 representation
(ASCII string) of your binary file content. This string begins with
characters that look similar to
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: the type of text detection you
need from the file. Allowed values are TEXT_DETECTION or
DOCUMENT_TEXT_DETECTION.
LANGUAGE_HINT: the BCP 47 language tags to use as
language hints for text detection, such as en-t-i0-handwrit. This
field is optional and the system interprets an empty value as automatic
language detection.
Add the following code to the Python script you created:
fromgoogle.cloudimportvisionimportgoogle.authfromgoogle.auth.transportimportrequestsfromgoogle.api_core.client_optionsimportClientOptionsaudience="https://ENDPOINT:443"api_endpoint="ENDPOINT:443"defvision_client(creds):opts=ClientOptions(api_endpoint=api_endpoint)returnvision.ImageAnnotatorClient(credentials=creds,client_options=opts)defmain():creds=Nonetry:creds,project_id=google.auth.default()creds=creds.with_gdch_audience(audience)req=requests.Request()creds.refresh(req)print("Got token: ")print(creds.token)exceptExceptionase:print("Caught exception"+str(e))raiseereturncredsdefvision_func(creds):vc=vision_client(creds)input_config={"content":"BASE64_ENCODED_FILE"}features=[{"type_":vision.Feature.Type.FEATURE_TYPE}]# Each requests element corresponds to a single file. To annotate more# files, create a request element for each file and add it to# the array of requestsreq={"input_config":input_config,"features":features}metadata=[("x-goog-user-project","projects/PROJECT_ID")]resp=vc.annotate_file(req,metadata=metadata)print(resp)if__name__=="__main__":creds=main()vision_func(creds)
BASE64_ENCODED_FILE: the Base64 representation
(ASCII string) of your file content. This string begins with
characters that look similar to
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: the type of text detection you
need from the file. Allowed values are TEXT_DETECTION or
DOCUMENT_TEXT_DETECTION.
PROJECT_ID: your project ID.
Save the Python script.
Run the Python script to detect text in the file:
pythonSCRIPT_NAME
Replace SCRIPT_NAME with the name you gave to your
Python script, such as vision.py.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-03 UTC."],[],[],null,["# Detect text in files\n\nThis page shows you how to detect text in files using the\nOptical Character Recognition (OCR) API on Google Distributed Cloud (GDC) air-gapped appliance.\n\nThe OCR service of Vertex AI on\nGDC air-gapped appliance detects text in PDF and TIFF files using the\n`BatchAnnotateFiles` API method.\n| **Note:** The `BatchAnnotateFiles` API method only supports a single request per batch call.\n\nBefore you begin\n----------------\n\nBefore you can start using the OCR API, you must have a project\nwith the OCR API enabled and have the appropriate credentials.\nYou can also install client libraries to help you make calls to the API. For\nmore information, see [Set up a character recognition project](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vai-set-up-ocr).\n\nDetect text with inline requests\n--------------------------------\n\nThe `BatchAnnotateFiles` method detects text from a batch of PDF or TIFF files.\nYou send the file from which you want to detect text directly as content in the\nAPI request. The system returns the resulting detected text in JSON format in\nthe API response.\n\nYou must specify values for the fields in the JSON body of your API request. The\nfollowing table contains a description of the request body fields you must\nprovide when you use the `BatchAnnotateFiles` API method for your text\ndetection requests:\n\n### Make an inline API request\n\nMake a request to the OCR pre-trained API using the REST API\nmethod. Otherwise, interact with the OCR pre-trained API from a\nPython script to detect text from PDF or TIFF files.\n| **Note:** The `BatchAnnotateFiles` API method only supports a single request per batch call.\n\nThe following examples show how to detect text in a file using\nOCR: \n\n### REST\n\nFollow these steps to detect text in files using the REST API method:\n\n1. Save the following `request.json` file for your request body:\n\n cat \u003c\u003c- EOF \u003e request.json\n {\n \"requests\": [\n {\n \"input_config\": {\n \"content\": \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-err\"\u003eBASE\u003c/span\u003e\u003cspan class=\"devsite-syntax-mi\"\u003e64\u003c/span\u003e\u003cspan class=\"devsite-syntax-err\"\u003e_ENCODED_FILE\u003c/span\u003e\u003c/var\u003e,\n \"mime_type\": \"application/pdf\"\n },\n \"features\": [\n {\n \"type\": \"\u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e\"\n }\n ],\n \"image_context\": {\n \"language_hints\": [\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_1\u003c/var\u003e\",\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_2\u003c/var\u003e\",\n ...\n ]\n },\n \"pages\": []\n }\n ]\n }\n EOF\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_FILE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary file content. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the file. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003eLANGUAGE_HINT\u003c/var\u003e: the BCP 47 language tags to use as language hints for text detection, such as `en-t-i0-handwrit`. This field is optional and the system interprets an empty value as automatic language detection.\n2. [Get an authentication token](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-auth).\n\n3. Make the request:\n\n ### curl\n\n curl -X POST \\\n -H \"Authorization: Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\" \\\n -H \"x-goog-user-project: projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/files:annotate\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-status).\n\n ### PowerShell\n\n $headers = @{\n \"Authorization\" = \"Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\"\n \"x-goog-user-project\" = \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"\n }\n\n Invoke-WebRequest\n -Method POST\n -Headers $headers\n -ContentType: \"application/json; charset=utf-8\"\n -InFile request.json\n -Uri \"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/files:annotate\" | Select-Object -Expand Content\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-status).\n\n### Python\n\nFollow these steps to use the OCR service from a Python\nscript to detect text in a file:\n\n1. [Install the latest version of the OCR client library](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-install-libraries).\n\n2. [Set the required environment variables on a Python script](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vai-set-up-ocr#set-env-var).\n\n3. [Authenticate your API request](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-auth).\n\n4. Add the following code to the Python script you created:\n\n from google.cloud import vision\n import google.auth\n from google.auth.transport import requests\n from google.api_core.client_options import ClientOptions\n\n audience = \"https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n api_endpoint=\"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n\n def vision_client(creds):\n opts = ClientOptions(api_endpoint=api_endpoint)\n return vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.services.image_annotator.ImageAnnotatorClient.html(credentials=creds, client_options=opts)\n\n def main():\n creds = None\n try:\n creds, project_id = google.auth.default()\n creds = creds.with_gdch_audience(audience)\n req = requests.Request()\n creds.refresh(req)\n print(\"Got token: \")\n print(creds.token)\n except Exception as e:\n print(\"Caught exception\" + str(e))\n raise e\n return creds\n\n def vision_func(creds):\n vc = vision_client(creds)\n input_config = {\"content\": \"\u003cvar translate=\"no\"\u003eBASE64_ENCODED_FILE\u003c/var\u003e\"}\n features = [{\"type_\": vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.types.Feature.html.Type.\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eFEATURE_TYPE\u003c/span\u003e\u003c/var\u003e}]\n # Each requests element corresponds to a single file. To annotate more\n # files, create a request element for each file and add it to\n # the array of requests\n req = {\"input_config\": input_config, \"features\": features}\n\n metadata = [(\"x-goog-user-project\", \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\")]\n\n resp = vc.annotate_file(req,metadata=metadata)\n\n print(resp)\n\n if __name__==\"__main__\":\n creds = main()\n vision_func(creds)\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/appliance/application/ao-user/vertex-ai-api-status).\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_FILE\u003c/var\u003e: the Base64 representation (ASCII string) of your file content. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the file. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n5. Save the Python script.\n\n6. Run the Python script to detect text in the file:\n\n python \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eSCRIPT_NAME\u003c/span\u003e\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eSCRIPT_NAME\u003c/var\u003e with the name you gave to your\n Python script, such as `vision.py`."]]