This page shows you how to detect text in files using the Optical Character Recognition (OCR) API on Google Distributed Cloud (GDC) air-gapped appliance.
The OCR service of Vertex AI on
GDC air-gapped appliance detects text in PDF and TIFF files using the
BatchAnnotateFiles
API method.
Before you begin
Before you can start using the OCR API, you must have a project with the OCR API enabled and have the appropriate credentials. You can also install client libraries to help you make calls to the API. For more information, see Set up a character recognition project.
Detect text with inline requests
The BatchAnnotateFiles
method detects text from a batch of PDF or TIFF files.
You send the file from which you want to detect text directly as content in the
API request. The system returns the resulting detected text in JSON format in
the API response.
You must specify values for the fields in the JSON body of your API request. The
following table contains a description of the request body fields you must
provide when you use the BatchAnnotateFiles
API method for your text
detection requests:
Request body fields | Field description |
---|---|
content |
The files with text to detect. You provide the Base64 representation (ASCII string) of your binary file content. |
mime_type |
The source file type. You must set it to one of the following values:
|
type |
The type of text detection you need from the file. Specify one of the two annotation features:
|
language_hints |
Optional. List of languages to use for the text detection. The system interprets an empty value for this field as automatic language detection. You don't need to set the language_hints field for languages based on the Latin alphabet.If you know the language of the text in the file, setting a hint improves results. |
pages |
Optional. The number of pages from the file to process for text detection. The maximum number of pages that you can specify is five. If you don't specify the number of pages, the service processes the first five pages of the file. |
Make an inline API request
Make a request to the OCR pre-trained API using the REST API method. Otherwise, interact with the OCR pre-trained API from a Python script to detect text from PDF or TIFF files.
The following examples show how to detect text in a file using OCR:
REST
Follow these steps to detect text in files using the REST API method:
Save the following
request.json
file for your request body:cat <<- EOF > request.json { "requests": [ { "input_config": { "content": BASE64_ENCODED_FILE, "mime_type": "application/pdf" }, "features": [ { "type": "FEATURE_TYPE" } ], "image_context": { "language_hints": [ "LANGUAGE_HINT_1", "LANGUAGE_HINT_2", ... ] }, "pages": [] } ] } EOF
Replace the following:
BASE64_ENCODED_FILE
: the Base64 representation (ASCII string) of your binary file content. This string begins with characters that look similar to/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
.FEATURE_TYPE
: the type of text detection you need from the file. Allowed values areTEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
.LANGUAGE_HINT
: the BCP 47 language tags to use as language hints for text detection, such asen-t-i0-handwrit
. This field is optional and the system interprets an empty value as automatic language detection.
Make the request:
curl
curl -X POST \ -H "Authorization: Bearer TOKEN" \ -H "x-goog-user-project: projects/PROJECT_ID" \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ https://ENDPOINT/v1/files:annotate
Replace the following:
TOKEN
: the authentication token you obtained.PROJECT_ID
: your project ID.ENDPOINT
: the OCR endpoint that you use for your organization. For more information, view service status and endpoints.
PowerShell
$headers = @{ "Authorization" = "Bearer TOKEN" "x-goog-user-project" = "projects/PROJECT_ID" } Invoke-WebRequest -Method POST -Headers $headers -ContentType: "application/json; charset=utf-8" -InFile request.json -Uri "ENDPOINT/v1/files:annotate" | Select-Object -Expand Content
Replace the following:
TOKEN
: the authentication token you obtained.ENDPOINT
: the OCR endpoint that you use for your organization. For more information, view service status and endpoints.
Python
Follow these steps to use the OCR service from a Python script to detect text in a file:
Add the following code to the Python script you created:
from google.cloud import vision import google.auth from google.auth.transport import requests from google.api_core.client_options import ClientOptions audience = "https://ENDPOINT:443" api_endpoint="ENDPOINT:443" def vision_client(creds): opts = ClientOptions(api_endpoint=api_endpoint) return vision.ImageAnnotatorClient(credentials=creds, client_options=opts) def main(): creds = None try: creds, project_id = google.auth.default() creds = creds.with_gdch_audience(audience) req = requests.Request() creds.refresh(req) print("Got token: ") print(creds.token) except Exception as e: print("Caught exception" + str(e)) raise e return creds def vision_func(creds): vc = vision_client(creds) input_config = {"content": "BASE64_ENCODED_FILE"} features = [{"type_": vision.Feature.Type.FEATURE_TYPE}] # Each requests element corresponds to a single file. To annotate more # files, create a request element for each file and add it to # the array of requests req = {"input_config": input_config, "features": features} metadata = [("x-goog-user-project", "projects/PROJECT_ID")] resp = vc.annotate_file(req,metadata=metadata) print(resp) if __name__=="__main__": creds = main() vision_func(creds)
Replace the following:
ENDPOINT
: the OCR endpoint that you use for your organization. For more information, view service status and endpoints.BASE64_ENCODED_FILE
: the Base64 representation (ASCII string) of your file content. This string begins with characters that look similar to/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
.FEATURE_TYPE
: the type of text detection you need from the file. Allowed values areTEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
.PROJECT_ID
: your project ID.
Save the Python script.
Run the Python script to detect text in the file:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, such asvision.py
.