Optical Character Recognition (OCR) is one of the three Vertex AI pre-trained APIs available on Google Distributed Cloud (GDC) air-gapped.
Use the OCR feature of Vertex AI to detect text in various file types. Vertex AI detects typed text in a photo image or handwritten text.
Learn more about OCR-supported languages detected by the text-recognition feature of OCR in a single image.
Before you begin
To get the permissions you need to use the Vertex AI Optical Character Recognition (OCR) pre-trained API, ask your Project IAM Admin to grant you the AI OCR Developer (ai-ocr-developer
) role in your project namespace.
Examples of files with detected text
The examples illustrate how Vertex AI detects and extracts text from images.
Road sign photograph
Figure 1 is a photograph that contains a street or traffic sign. Vertex AI returns a JSON file with the extracted string, individual words, and their bounding boxes.
Figure 1. Road sign photograph where Vertex AI detects words and their bounding boxes.
Scanned image of typed text
Figure 2 is a scanned image of typed text. Vertex AI returns a JSON file containing page, block, paragraph, word, and break information.
Figure 2. Scanned image of typed text where Vertex AI detects information such as words, pages, and paragraphs.
Image of handwriting
Figure 3 is an image of handwritten text. Vertex AI detects and extracts text from these images. For a list of handwriting scripts that are supported for handwriting recognition, see Handwriting scripts.
Figure 3. Handwriting image where Vertex AI detects text.
Feature differences from Google Cloud
This section describes the differences between OCR on Distributed Cloud and Vision and OCR on Google Cloud.
The primary difference is that Distributed Cloud only supports OCR. Vision on Distributed Cloud doesn't provide other features available in Vision on Google Cloud, such as image recognition, facial recognition, and crop hint detection.
The following table describes the supported features in Distributed Cloud.
Feature | GDC feature |
---|---|
OCR API methods | OCR on GDC supports the following two methods:
|
Language support | OCR on GDC supports a subset of the languages supported on Google Cloud. |
Asynchronous methods | AsyncBatchAnnotateFiles |
BatchAnnotateFiles method |
The following subset of fields supported on Google Cloud are supported on GDC:
If you set any other fields in a request, they are ignored or cause an error. |
BatchAnnotateImages method |
The following subset of fields supported on Google Cloud are supported on GDC:
If you set any other fields in a request, they are ignored or cause an error. |
File location | In GDC, you can only process images for OCR if they are stored locally. |
Use the API to detect text in files
Vertex AI on Distributed Cloud supports the following two methods for extracting text from files and images:
BatchAnnotateImages
: Extract text from JPEG and PNG files.BatchAnnotateFiles
: Extract text from PDF and TIFF files.
Vertex AI on Distributed Cloud doesn't support any other OCR API methods that are supported on Google Cloud.
Use the BatchAnnotateImages
method
Use the BatchAnnotateImages
method to detect and extract text from a batch of
JPEG and PNG files. Specify the following fields in the request:
type
: The type of the text to extract. Specify one of two OCR types:TEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
.content
: The images with text to detect. You can only process images that are stored locally in your Distributed Cloud environment. The system can't access images available to the public or stored in a Google Cloud bucket. Therefore, the system doesn't support them.language_hints
: Optional. List of languages to use for theTEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
OCR types. In most cases, an empty value yields the best results, because it enables automatic language detection. For languages based on the Latin alphabet, you don't need to set thelanguage_hints
field. In rare cases, when you know the language of the text in the image, setting a hint improves results. Use thelanguage_hints
field with caution. If a hint is wrong, it can significantly impede text detection.
The BatchAnnotateImages
method on Distributed Cloud supports a
subset of the parameters that you can specify when you call BatchAnnotateImages
in Vertex AI on Google Cloud. If you specify any other parameters in a
BatchAnnotateImages
request on Distributed Cloud, they are ignored
or result in an error.
For more information, see BatchAnnotateImages
in the
Distributed Cloud API reference.
Use the BatchAnnotateFiles
method
Use the BatchAnnotateFiles
method to detect and extract text from a batch of
PDF and TIFF files. Specify the following fields in the request:
type
: The type of the text to extract. Specify one of two OCR types:TEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
.content
: The images with text to detect. You can only process images that are stored locally in your Distributed Cloud environment. The system can't access images available to the public or stored in a Google Cloud bucket. Therefore, the system doesn't support them.language_hints
: Optional. List of languages to use for theTEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
OCR types. In most cases, an empty value yields the best results, because it enables automatic language detection. For languages based on the Latin alphabet, you don't need to set thelanguage_hints
field. In rare cases, when you know the language of the text in the image, setting a hint improves results. Use thelanguage_hints
field with caution. If a hint is wrong, it can significantly impede text detection.mime_type
: The type of the file. You must set it to one of the following values:application/pdf
image/tiff
pages
: Optional. The pages of the file that are processed for text detection. The maximum number of pages that you can specify is five. If you don't specify the number of pages, the first five pages of the file are processed.
The BatchAnnotateFiles
method on GDC supports a subset
of the parameters that you can specify on Google Cloud. If you specify any other
parameters in a BatchAnnotateFiles
request on Distributed Cloud,
they are ignored or result in an error.
For more information, see BatchAnnotateFiles
in the
Distributed Cloud API reference.
Use the OCR client library
The Optical Character Recognition (OCR) API detects text on a local image by sending the contents of an image file as a base64-encoded string in the body of a request.
Prerequisites to use the OCR client library
To use the OCR client library, make sure the following prerequisites are complete:
Install the Vertex AI client libraries. For more information, see Install Vertex AI client libraries.
Get the OCR endpoint. Make a note of the endpoint and use it where you see
OCR_ENDPOINT
in the following client library sample code. For more information, see Get the OCR endpoint.
Create OCR text detection requests
REST
To send an OCR request using Python, first install the
Vertex AI client libraries.
Before you send an OCR request using REST, replace the
BASE64_ENCODED_IMAGE
variable with the base64 ASCII
string representation of your binary image data. This string begins with
characters that look similar to the following string:
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
The sample JSON request body:
{ "requests": [ { "image": { "content": BASE64_ENCODED_IMAGE }, "features": [ { "type": "TEXT_DETECTION" } ] } ] }
To send your request, choose one of the following options:
curl
Save the request body in a file named
request.json
, then run the following command:
curl -X POST \ -H "Content-Type: application/json; charset=utf-8" \ -H "x-goog-user-project: projects/PROJECT_ID" \ -d @request.json \ "OCR_ENDPOINT:v1/images:annotate"
PowerShell
Save the request body in a file named
request.json
, then run the following command:
$cred = gcloud auth application-default print-access-token $headers = @{ "Authorization" = "Bearer $cred" "x-goog-user-project" = "projects/PROJECT_ID" } Invoke-WebRequest -Method POST -Headers $headers -ContentType: "application/json; charset=utf-8" -InFile request.json -Uri "OCR_ENDPOINT/v1/images:annotate" | Select-Object -Expand Content
Python
To send an OCR request using Python, first install the Vertex AI client libraries and Python 3.7.
The Python code sample:
Optional: Specify the language in a request
The BatchAnnotateFiles
and BatchAnnotateImages
methods support one or more
language hints to specify the language of any text in the image. If you don't
specify a language, Vertex AI enables automatic language detection,
which usually results in the most accurate results. For languages based on the
Latin alphabet, you don't need to set language hints. In rare cases, when the
language of the text in the image is known, setting a hint improves the
results. However, if the hint is incorrect, it might cause a significant
impediment. Text detection returns an error if a specified language isn't one
of the supported languages.
Add one or more supported languages to the imageContext.languageHints
request in the request.json
file to provide a language hint. The following
code sample is a demonstration:
{ "requests": [ { "image": { "content": BASE64_ENCODED_IMAGE }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ], "imageContext": { "languageHints": ["en-t-i0-handwrit"] } } ] }
Get the OCR endpoint
To get the endpoint for OCR, see View service statuses and endpoints.
OCR limits
The following table lists the current limits in Optical Character Recognition (OCR) on Google Distributed Cloud (GDC) air-gapped.
File limit for OCR | Value |
---|---|
Maximum number of pages | 5 |
Maximum file size | 20 MB |
Maximum image size | 20 million pixels (length x width) |
Submitted files for OCR that exceed the maximum number of pages or the maximum file size return an error. Submitted files that exceed the maximum image size are downsized to 20 million pixels.
Supported file types for OCR
The Optical Character Recognition (OCR) pre-trained API detects and transcribes text from the following file types:
- TIFF
- JPG
- PNG
You must store the files locally in your Google Distributed Cloud air-gapped (GDC) environment. You can't access files hosted in Cloud Storage or files that are publicly available for text detection.