Learn how to perform optical character recognition (OCR) on Google Cloud. This tutorial demonstrates how to upload image files to Cloud Storage, extract text from the images using the Cloud Vision API, translate the text using the Google Cloud Translation API, and save your translations back to Cloud Storage. Pub/Sub is used to queue various tasks and trigger the right Cloud Run functions to carry them out.
For more information about sending a text detection (OCR) request, see Detect text in images, Detect handwriting in images, or Detect text in files (PDF/TIFF).
Objectives
- Write and deploy several Background Cloud Run functions.
- Upload images to Cloud Storage.
- Extract, translate and save text contained in uploaded images.
Costs
In this document, you use the following billable components of Google Cloud:
- Cloud Run functions
- Pub/Sub
- Cloud Storage
- Cloud Translation API
- Cloud Vision
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Functions, Cloud Build, Cloud Pub/Sub, Cloud Storage, Cloud Translation, and Cloud Vision APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Functions, Cloud Build, Cloud Pub/Sub, Cloud Storage, Cloud Translation, and Cloud Vision APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- Prepare your development environment.
If you already have the gcloud CLI installed, update it by running the following command:
gcloud components update
Visualizing the flow of data
The flow of data in the OCR tutorial application involves several steps:
- An image that contains text in any language is uploaded to Cloud Storage.
- A Cloud Run function is triggered, which uses the Vision API to extract the text and detect the source language.
- The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language.
- If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, which is a different Pub/Sub topic.
- A Cloud Run function uses the Translation API to translate the text in the translation queue. The translated result is sent to the result queue.
- Another Cloud Run function saves the translated text from the result queue to Cloud Storage.
- The results are found in Cloud Storage as text files for each translation.
It may help to visualize the steps:
Preparing the application
Create a Cloud Storage bucket to upload images to, where
YOUR_IMAGE_BUCKET_NAME
is a globally unique bucket name:gcloud storage buckets create gs://
YOUR_IMAGE_BUCKET_NAME
Create a Cloud Storage bucket to save text translations to, where
YOUR_RESULT_BUCKET_NAME
is a globally unique bucket name:gcloud storage buckets create gs://
YOUR_RESULT_BUCKET_NAME
Create a Pub/Sub topic to publish translation requests to, where
YOUR_TRANSLATE_TOPIC_NAME
is the name of your translation request topic:gcloud pubsub topics create
YOUR_TRANSLATE_TOPIC_NAME
Create a Pub/Sub topic to publish finished translation results to, where
YOUR_RESULT_TOPIC_NAME
is the name of your translation result topic:gcloud pubsub topics create
YOUR_RESULT_TOPIC_NAME
Clone the sample app repository to your local machine:
Node.js
git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Python
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Go
git clone https://github.com/GoogleCloudPlatform/golang-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Java
git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Change to the directory that contains the Cloud Run functions sample code:
Node.js
cd nodejs-docs-samples/functions/ocr/app/
Python
cd python-docs-samples/functions/ocr/app/
Go
cd golang-samples/functions/ocr/app/
Java
cd java-docs-samples/functions/ocr/ocr-process-image/
Understanding the code
Importing dependencies
The application must import several dependencies in order to communicate with Google Cloud Platform services:
Node.js
Python
Go
Java
Processing images
The following function reads an uploaded image file from Cloud Storage and calls a function to detect whether the image contains text:
Node.js
Python
Go
Java
The following function extracts text from the image using the Vision API and queues the text for translation:
Node.js
Python
Go
Java
Translating text
The following function translates the extracted text and queues the translated text to be saved back to Cloud Storage:
Node.js
Python
Go
Java
Saving the translations
Finally, the following function receives the translated text and saves it back to Cloud Storage:
Node.js
Python
Go
Java
Deploying the functions
To deploy the image processing function with a Cloud Storage trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-extract \ --runtime nodejs20 \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--entry-point processImage \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-extract \ --runtime python312 \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--entry-point process_image \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-extract \ --runtime go121 \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--entry-point ProcessImage \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-extract \ --entry-point functions.OcrProcessImage \ --runtime java17 \ --memory 512MB \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.where
YOUR_IMAGE_BUCKET_NAME
is the name of your Cloud Storage bucket where you will be uploading images.To deploy the text translation function with a Pub/Sub trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-translate \ --runtime nodejs20 \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--entry-point translateText \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-translate \ --runtime python312 \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--entry-point translate_text \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-translate \ --runtime go121 \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--entry-point TranslateText \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-translate \ --entry-point functions.OcrTranslateText \ --runtime java17 \ --memory 512MB \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.To deploy the function that saves results to Cloud Storage with a Cloud Pub/Sub trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-save \ --runtime nodejs20 \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--entry-point saveResult \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-save \ --runtime python312 \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--entry-point save_result \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-save \ --runtime go121 \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--entry-point SaveResult \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-save \ --entry-point functions.OcrSaveResult \ --runtime java17 \ --memory 512MB \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.
Uploading an image
Upload an image to your image Cloud Storage bucket:
gcloud storage cp
PATH_TO_IMAGE
gs://YOUR_IMAGE_BUCKET_NAME
where
PATH_TO_IMAGE
is a path to an image file (that contains text) on your local system.YOUR_IMAGE_BUCKET_NAME
is the name of the bucket where you are uploading images.
You can download one of the images from the sample project.
Watch the logs to be sure the executions have completed:
gcloud functions logs read --limit 100
You can view the saved translations in the Cloud Storage bucket you used for
YOUR_RESULT_BUCKET_NAME
.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Deleting the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting the function
Deleting Cloud Run functions does not remove any resources stored in Cloud Storage.
To delete the Cloud Run functions you created in this tutorial, run the following commands:
gcloud functions delete ocr-extract gcloud functions delete ocr-translate gcloud functions delete ocr-save
You can also delete Cloud Run functions from the Google Cloud console.