Try Optical Character Recognition (OCR)

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Optical Character Recognition (OCR) pre-trained API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Follow these steps before trying OCR:

  1. Set up a project using the GDC console to group the Vertex AI services. For information about creating and using projects, see Create a project.

  2. Ask your Project IAM Admin to grant you the AI OCR Developer (ai-ocr-developer) role in your project namespace.

  3. Enable the OCR pre-trained API.

  4. Download the gdcloud command-line interface (CLI).

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-ocr-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-ocr-developer

Set your environment variables

Before running the OCR pre-trained service, set your environment variable.


Authenticate the request

You must get a token to authenticate the requests to the OCR pre-trained service. Follow these steps:

Export the identity token for the specified account to an environment variable:

export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"

Replace ENDPOINT with the OCR endpoint. For more information, view service statuses and endpoints.

  1. Install the google-auth client library.

    pip install google-auth
  2. Save the following code to a Python script, and update the ENDPOINT to the OCR endpoint. For more information, see View service statuses and endpoints.

    import google.auth
    from google.auth.transport import requests
    api_endpoint = "https://ENDPOINT"
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(api_endpoint)
    def test_get_token():
      req = requests.Request()
    if __name__=="__main__":
  3. Run the script to fetch the token.

You must add the fetched token to the header of the curl requests as in the following example:

-H "Authorization: Bearer TOKEN"

Make the curl request:

echo '{"requests": [{"image": {"content": "'iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII='" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ] }] }' | curl --cacert CERTIFICATE_NAME --data-binary @- -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" -H "x-goog-user-project: projects/PROJECT_ID" https://ENDPOINT/v1/images:annotate

Run the OCR pre-trained API sample script

This example shows you how to interact with an OCR pre-trained API.

  1. Check whether the client library for OCR is installed.

      pip freeze | grep vision
      # output example: google-cloud-vision==3.0.0

    If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library.

      pip uninstall google-cloud-vision
  2. Specify the console endpoint and the client library for OCR (provided in the example).

       wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-vision
  3. Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.

    tar -xvzf CLIENT_LIBRARY
    pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
  4. Use the OCR client library script to generate the token, and make requests to the OCR service.

  5. Set up your environment variable.


Run the OCR sample

Replace the ENDPOINT with the OCR endpoint that you use for your organization.

from import vision
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT:443"

def vision_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  """Create vision client."""
  return vision.ImageAnnotatorClient(credentials=creds, client_options=opts)

def main():
  creds = None
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    print("Got token: ")
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def vision_func(creds):
  vc = vision_client(creds)
  features = [{"type_": vision.Feature.Type.DOCUMENT_TEXT_DETECTION}]
  # Each requests element corresponds to a single image.  To annotate more
  # images, create a request element for each image and add it to
  # the array of requests
  req = {"image": image, "features": features}

  metadata = [("x-goog-user-project", "projects/PROJECT_ID")]

  resp = vc.annotate_image(req,metadata=metadata)


if __name__=="__main__":
  creds = main()

Replace PROJECT_ID with the ID of the project that you want to use.

What's next