Try Optical Character Recognition (OCR)

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Optical Character Recognition (OCR) pre-trained API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Follow these steps before trying OCR:

  1. Set up a project using the GDC console to group the Vertex AI services. For information about creating and using projects, see Create a project.

  2. Ask your Project IAM Admin to grant you the AI OCR Developer (ai-ocr-developer) role in your project namespace.

  3. Enable the OCR pre-trained API.

  4. Download the gdcloud command-line interface (CLI).

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-ocr-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-ocr-developer

Set your environment variables

Before running the OCR pre-trained service, set your environment variable.

  export GOOGLE_APPLICATION_CREDENTIALS="SERVICE_KEY".json

Authenticate the gdcloud CLI

You must get a token to authenticate the gdcloud CLI before sending requests to the OCR pre-trained service. Follow these steps:

  1. Install the google-auth client library.

    pip install google-auth
    
  2. Save the following code to a Python script, and update the ENDPOINT to the OCR endpoint. For more information, see View service statuses and endpoints.

    import google.auth
    from google.auth.transport import requests
    
    api_endpoint = "https://ENDPOINT"
    
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(api_endpoint)
    
    def test_get_token():
      req = requests.Request()
      creds.refresh(req)
      print(creds.token)
    
    if __name__=="__main__":
      test_get_token()
    
  3. Run the script to fetch the token.

    You must add the fetched token to the header of the grpcurl and curl requests as in the following example:

    -H "Authorization: Bearer TOKEN"
    
  4. Make the grpcurl or curl request:

    grpcurl

    1. If you don't have grpcurl installed, download and install it from a resource outside of Distributed Cloud (https://github.com/fullstorydev/grpcurl#from-source).

    2. Make the request:

      echo '{ "requests": [{"features": [{"type": "TEXT_DETECTION"}], "image": {"content": "'iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII='" } }] }' | grpcurl -vv --cacert ourcert.crt -authority ENDPOINT -H "Authorization: Bearer TOKEN" -max-msg-sz 50000000 -d @ ENDPOINT:443 google.cloud.vision.v1.ImageAnnotator.BatchAnnotateImages
      

    curl

    echo '{"requests": [{"image": {"content": "'iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII='" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ] }] }' | curl --cacert CERTIFICATE_NAME --data-binary @- -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v1/images:annotate
    

Run the OCR pre-trained API sample script

This example shows you how to interact with an OCR pre-trained API.

  1. Check whether the client library for OCR is installed.

      pip freeze | grep vision
      # output example: google-cloud-vision==3.0.0
    

    If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library.

      pip uninstall google-cloud-vision
    
  2. Specify the console endpoint and the client library for OCR (provided in the example).

       wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-vision
    
  3. Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.

    tar -xvzf CLIENT_LIBRARY
    
    pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
    
  4. Use the OCR client library script to generate the token, and make requests to the OCR service.

  5. Set up your environment variable.

    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""SERVICE_KEY".json"
    

Run the OCR sample

Replace the ENDPOINT with the OCR endpoint that you use for your organization.

from google.cloud import vision
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT:443"
api_endpoint="ENDPOINT:443"

def vision_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  """Create vision client."""
  return vision.ImageAnnotatorClient(credentials=creds, client_options=opts)

def main():
  creds = None
  try:
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def vision_func(creds):
  vc = vision_client(creds)
  image = {"content": "iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII="}
  features = [{"type_": vision.Feature.Type.DOCUMENT_TEXT_DETECTION}]
  # Each requests element corresponds to a single image.  To annotate more
  # images, create a request element for each image and add it to
  # the array of requests
  req = {"image": image, "features": features}
  resp = vc.annotate_image(req)
  print(resp)

if __name__=="__main__":
  creds = main()
  vision_func(creds)

What's next