Detect text in images

Optical Character Recognition (OCR) is one of the three Vertex AI pre-trained APIs available on Google Distributed Cloud Hosted (GDCH).

Use the OCR feature of Vertex AI to detect text in various file types. Vertex AI detects typed text in a photo image or handwritten text.

Learn more about OCR-supported languages detected by the text-recognition feature of OCR in a single image.

Examples of files with detected text

The examples illustrate how Vertex AI detects and extracts text from images.

Road sign photograph

Figure 1 is a photograph that contains a street or traffic sign. Vertex AI returns a JSON file with the extracted string, individual words, and their bounding boxes.

Road sign

Figure 1. Road sign photograph where Vertex AI detects words and their bounding boxes.

Scanned image of typed text

Figure 2 is a scanned image of typed text. Vertex AI returns a JSON file containing page, block, paragraph, word, and break information.

Dense figure with annotations

Figure 2. Scanned image of typed text where Vertex AI detects information such as words, pages, and paragraphs.

Image of handwriting

Figure 3 is an image of handwritten text. Vertex AI detects and extracts text from these images. For a list of handwriting scripts that are supported for handwriting recognition, see Handwriting scripts.

Handwriting figure

Figure 3. Handwriting image where Vertex AI detects text.

Feature differences from Google Cloud

This section describes the differences between OCR on Google Distributed Cloud Hosted (GDCH) and Vision and OCR on Google Cloud.

The primary difference is that GDCH Vision only supports OCR. Vision on GDCH doesn't provide other functionality available in Vision on Google Cloud, such as image recognition, facial recognition, and crop hint detection.

The following table describes the supported features in GDCH.

Feature GDCH functionality
OCR API methods OCR on GDCH supports the following two methods:
  • BatchAnnotateImages
  • BatchAnnoteFiles
Language support OCR on GDCH supports a subset of the languages supported on Google Cloud.
Asynchronous methods AsyncBatchAnnotateFiles
BatchAnnotateFiles method The following subset of fields supported on Google Cloud are supported on GDCH:
  • type
  • content
  • language_hints
  • mime_type
  • pages

If you set any other fields in a request, they are ignored or cause an error.
BatchAnnotateImages method The following subset of fields supported on Google Cloud are supported on GDCH:
  • type
  • content
  • language_hints

If you set any other fields in a request, they are ignored or cause an error.
File location In GDCH, you can only process images for OCR if they are stored locally.

Use the API to detect text in files

Vertex AI on GDCH supports the following two methods for extracting text from files and images:

Vertex AI on GDCH doesn't support any other OCR API methods that are supported on Google Cloud.

Use the BatchAnnotateImages method

Use the BatchAnnotateImages method to detect and extract text from a batch of JPEG and PNG files. Specify the following fields in the request:

  • type: The type of the text to extract. Specify one of two OCR types: TEXT_DETECTION or DOCUMENT_TEXT_DETECTION.

  • content: The images with text to detect. You can only process images that are stored locally in your GDCH environment. The system can't access images available to the public or stored in a Google Cloud bucket. Therefore, the system doesn't support them.

  • language_hints: Optional. List of languages to use for the TEXT_DETECTION or DOCUMENT_TEXT_DETECTION OCR types. In most cases, an empty value yields the best results, because it enables automatic language detection. For languages based on the Latin alphabet, you don't need to set the language_hints field. In rare cases, when you know the language of the text in the image, setting a hint improves results. Use the language_hints field with caution. If a hint is wrong, it can significantly impede text detection.

The BatchAnnotateImages method on GDCH supports a subset of the parameters that you can specify when you call BachAnnotateImages in Vertex AI on Google Cloud. If you specify any other parameters in a BatchAnnotateImages request on GDCH, they are ignored or result in an error.

For more information, see BatchAnnotateImages in the GDCH API reference.

Use the BatchAnnotateFiles method

Use the BatchAnnotateFiles method to detect and extract text from a batch of PDF and TIFF files. Specify the following fields in the request:

  • type: The type of the text to extract. Specify one of two OCR types: TEXT_DETECTION or DOCUMENT_TEXT_DETECTION.

  • content: The images with text to detect. You can only process images that are stored locally in your GDCH environment. The system can't access images available to the public or stored in a Google Cloud bucket. Therefore, the system doesn't support them.

  • language_hints: Optional. List of languages to use for the TEXT_DETECTION or DOCUMENT_TEXT_DETECTION OCR types. In most cases, an empty value yields the best results, because it enables automatic language detection. For languages based on the Latin alphabet, you don't need to set the language_hints field. In rare cases, when you know the language of the text in the image, setting a hint improves results. Use the language_hints field with caution. If a hint is wrong, it can significantly impede text detection.

  • mime_type: The type of the file. You must set it to one of the following values:

    • application/pdf
    • image/tiff
  • pages: Optional. The pages of the file that are processed for text detection. The maximum number of pages that you can specify is five. If you don't specify the number of pages, the first five pages of the file are processed.

The BatchAnnotateFiles method on GDCH supports a subset of the parameters that you can specify on Google Cloud. If you specify any other parameters in a BatchAnnotateFiles request on GDCH, they are ignored or result in an error.

For more information, see BatchAnnotateFiles in the GDCH API reference.

Use the OCR client library

The Optical Character Recognition (OCR) API detects text on a local image by sending the contents of an image file as a base64-encoded string in the body of a request.

Prerequisites to use the OCR client library

To use the OCR client library, make sure the following prerequisites are complete:

  • Install the Vertex AI client libraries. For more information, see Install Vertex AI client libraries.

  • Get the OCR endpoint. Make a note of the endpoint and use it where you see OCR_ENDPOINT in the following client library sample code. For more information, see Get the OCR endpoint.

Create OCR text detection requests

REST


To send an OCR request using Python, first install the Vertex AI client libraries.

Before you send an OCR request using REST, replace the BASE64_ENCODED_IMAGE variable with the base64 ASCII string representation of your binary image data. This string begins with characters that look similar to the following string:

  • /9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==

The sample JSON request body:

{
  "requests": [
    {
      "image": {
        "content": BASE64_ENCODED_IMAGE
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

To send your request, choose one of the following options:

curl


Save the request body in a file named request.json, then run the following command:

curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"OCR_ENDPOINT:v1/images:annotate"

PowerShell


Save the request body in a file named request.json, then run the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
  -Method POST `
  -Headers $headers `
  -ContentType: "application/json; charset=utf-8" `
  -InFile request.json `
  -Uri "OCR_ENDPOINT/v1/images:annotate" | Select-Object -Expand Content
 

Python


To send an OCR request using Python, first install the Vertex AI client libraries and Python 3.7.

The Python code sample:

import io
from google.cloud import vision
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

# Get your credentials
def get_credentials():
creds = None
try:
  credentials, project_id = google.auth.default()
  credentials = credentials.with_gdch_audience('https://OCR_ENDPOINT:443')
  req = requests.Request()
  credentials.refresh(req)
except Exception as e:
  print("Caught exception" + str(e))
  raise e
return credentials

# Use your credentials to access the client library
def get_vision_client(credentials):
opts = ClientOptions(api_endpoint='OCR_ENDPOINT:443')
return vision.ImageAnnotatorClient(credentials= credentials, client_options=opts)

# Define the function that detects text in an image file
def detect_text(path):
credentials  = get_credentials()
client = get_vision_client(credentials)

with io.open(path, 'rb') as image_file:
  content = image_file.read()

image = vision.Image(content=content)

response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')

for text in texts:
  print('\n"{}"'.format(text.description))
  vertices = (['({},{})'.format(vertex.x, vertex.y)
    for vertex in text.bounding_poly.vertices])

  print('bounds: {}'.format(','.join(vertices)))

# Add error handling code here.
if response.error.message:
  raise Exception(
    'Error handling code goes here.'

# Call the "detect_text" function using the path to your text file
if __name__=="__main__": 
  detect_text("PATH_TO_IMAGE_FILE")

Optional: Specify the language in a request

The BatchAnnotateFiles and BatchAnnotateImages methods support one or more language hints to specify the language of any text in the image. If you don't specify a language, Vertex AI enables automatic language detection, which usually results in the most accurate results. For languages based on the Latin alphabet, you don't need to set language hints. In rare cases, when the language of the text in the image is known, setting a hint improves the results. However, if the hint is incorrect, it might cause a significant impediment. Text detection returns an error if a specified language isn't one of the supported languages.

Add one or more supported languages to the imageContext.languageHints request in the request.json file to provide a language hint. The following code sample is a demonstration:

{
  "requests": [
    {
      "image": {
        "content": BASE64_ENCODED_IMAGE
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ],
      "imageContext": {
        "languageHints": ["en-t-i0-handwrit"]
      }
    }
  ]
}

Get the OCR endpoint

To get the endpoint for OCR, see View service statuses and endpoints.

Handwriting scripts

The following scripts are supported for handwriting recognition. To learn which languages use each script, refer to the language tables in this page.

Script tag Name Support level
Beng Bengali Experimental
Cyrl Cyrillic Experimental
Deva Devanagari Experimental
Grek Greek Experimental
Hani Chinese Experimental
Jpan Japanese Supported
Kore Korean Supported
Latn Latin Supported
vi Vietnamese Experimental

OCR limits

The following table lists the current limits in Optical Character Recognition (OCR) on Google Distributed Cloud Hosted (GDCH).

File limit for OCR Value
Maximum number of pages 5
Maximum file size 20 MB
Maximum image size 20 million pixels (length x width)

Submitted files for OCR that exceed the maximum number of pages or the maximum file size return an error. Submitted files that exceed the maximum image size are downsized to 20 million pixels.

Supported file types for OCR

The Optical Character Recognition (OCR) pre-trained API detects and transcribes text from the following file types:

  • PDF
  • TIFF
  • JPG
  • PNG

You must store the files locally in your Google Distributed Cloud Hosted (GDCH) environment. You can't access files hosted in Cloud Storage or files that are publicly available for text detection.