Learn how to perform optical character recognition (OCR) on Google Cloud Platform. This tutorial demonstrates how to upload image files to Cloud Storage, extract text from the images using Cloud Vision, translate the text using the Cloud Translation API, and save your translations back to Cloud Storage. Pub/Sub is used to queue various tasks and trigger the right Cloud Run functions to carry them out.
For more information about sending a text detection (OCR) request, see Detect text in images, Detect handwriting in images, or Detect text in files (PDF/TIFF).
Objectives
- Write and deploy several event-driven functions.
- Upload images to Cloud Storage.
- Extract, translate and save text contained in uploaded images.
Costs
In this document, you use the following billable components of Google Cloud:
- Cloud Run functions
- Cloud Build
- Pub/Sub
- Artifact Registry
- Eventarc
- Cloud Run
- Cloud Logging
- Cloud Storage
- Cloud Translation API
- Cloud Vision
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Functions, Cloud Build, Cloud Run, Artifact Registry, Eventarc, Logging, Pub/Sub, Cloud Storage, Cloud Translation, and Cloud Vision APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Functions, Cloud Build, Cloud Run, Artifact Registry, Eventarc, Logging, Pub/Sub, Cloud Storage, Cloud Translation, and Cloud Vision APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- Prepare your development environment.
If you already have the gcloud CLI installed, update it by running the following command:
gcloud components update
Visualize the flow of data
The flow of data in the OCR tutorial application involves several steps:
- An image that contains text in any language is uploaded to Cloud Storage.
- A Cloud Run function is triggered, which uses the Vision API to extract the text and detect the source language.
- The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language.
- If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, which is a different Pub/Sub topic.
- A Cloud Run function uses the Cloud Translation API to translate the text in the translation queue. The translated result is sent to the result queue.
- Another Cloud Run function saves the translated text from the result queue to Cloud Storage.
- The results are found in Cloud Storage as text files for each translation.
It may help to visualize the steps:
Prepare the application
Create a Cloud Storage bucket to upload images to, where
YOUR_IMAGE_BUCKET_NAME
is a globally unique bucket name:gcloud storage buckets create gs://
YOUR_IMAGE_BUCKET_NAME
Create a Cloud Storage bucket to save text translations to, where
YOUR_RESULT_BUCKET_NAME
is a globally unique bucket name:gcloud storage buckets create gs://
YOUR_RESULT_BUCKET_NAME
Create a Pub/Sub topic to publish translation requests to, where
YOUR_TRANSLATE_TOPIC_NAME
is the name of your translation request topic:gcloud pubsub topics create
YOUR_TRANSLATE_TOPIC_NAME
Create a Pub/Sub topic to publish finished translation results to, where
YOUR_RESULT_TOPIC_NAME
is the name of your translation result topic:gcloud pubsub topics create
YOUR_RESULT_TOPIC_NAME
Clone the sample app repository to your local machine:
Node.js
git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Python
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Go
git clone https://github.com/GoogleCloudPlatform/golang-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Java
git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
Alternatively, you can download the sample as a zip file and extract it.
Change to the directory that contains the Cloud Run functions sample code:
Node.js
cd nodejs-docs-samples/functions/v2/ocr/app/
Python
cd python-docs-samples/functions/v2/ocr/
Go
cd golang-samples/functions/functionsv2/ocr/app/
Java
cd java-docs-samples/functions/v2/ocr/ocr-process-image/
Understand the code
This section describes the dependencies and functions that make up the OCR sample.
Import dependencies
The application must import several dependencies in order to communicate with Google Cloud Platform services:
Node.js
Python
Go
Java
Process images
The following function reads an uploaded image file from Cloud Storage and calls a function to detect whether the image contains text:
Node.js
Python
Go
Java
The following function extracts text from the image using the Vision API and queues the text for translation:
Node.js
Python
Go
Java
Translate text
The following function translates the extracted text and queues the translated text to be saved back to Cloud Storage:
Node.js
Python
Go
Java
Save the translations
Finally, the following function receives the translated text and saves it back to Cloud Storage:
Node.js
Python
Go
Java
Deploy the functions
To deploy the image processing function with a Cloud Storage trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-extract \ --gen2 \ --runtime=nodejs20 \ --region=
REGION
\ --source=. \ --entry-point=processImage \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-extract \ --gen2 \ --runtime=python312 \ --region=
REGION
\ --source=. \ --entry-point=process_image \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-extract \ --gen2 \ --runtime=go121 \ --region=
REGION
\ --source=. \ --entry-point=process-image \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-extract \ --gen2 \ --runtime=java17 \ --region=
REGION
\ --source=. \ --entry-point=functions.OcrProcessImage \ --memory=512MB \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.Replace the following:
- REGION: The name of the Google Cloud region where you want to deploy your function
(for example,
us-west1
). - YOUR_IMAGE_BUCKET_NAME: The name of your
Cloud Storage bucket where you will be uploading images. When deploying
Cloud Run functions, specify the bucket name alone without the leading
gs://
; for example,--trigger-event-filters="bucket=my-bucket"
.
- REGION: The name of the Google Cloud region where you want to deploy your function
(for example,
To deploy the text translation function with a Pub/Sub trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-translate \ --gen2 \ --runtime=nodejs20 \ --region=
REGION
\ --source=. \ --entry-point=translateText \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-translate \ --gen2 \ --runtime=python312 \ --region=
REGION
\ --source=. \ --entry-point=translate_text \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-translate \ --gen2 \ --runtime=go121 \ --region=
REGION
\ --source=. \ --entry-point=translate-text \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-translate \ --gen2 \ --runtime=java17 \ --region=
REGION
\ --source=. \ --entry-point=functions.OcrTranslateText \ --memory=512MB \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.To deploy the function that saves results to Cloud Storage with a Pub/Sub trigger, run the following command in the directory that contains the sample code (or in the case of Java, the
pom.xml
file):Node.js
gcloud functions deploy ocr-save \ --gen2 \ --runtime=nodejs20 \ --region=
REGION
\ --source=. \ --entry-point=saveResult \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Node.js version to run your function.Python
gcloud functions deploy ocr-save \ --gen2 \ --runtime=python312 \ --region=
REGION
\ --source=. \ --entry-point=save_result \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Python version to run your function.Go
gcloud functions deploy ocr-save \ --gen2 \ --runtime=go121 \ --region=
REGION
\ --source=. \ --entry-point=save-result \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Go version to run your function.Java
gcloud functions deploy ocr-save \ --gen2 \ --runtime=java17 \ --region=
REGION
\ --source=. \ --entry-point=functions.OcrSaveResult \ --memory=512MB \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"Use the
--runtime
flag to specify the runtime ID of a supported Java version to run your function.
Upload an image
Upload an image to your image Cloud Storage bucket:
gcloud storage cp
PATH_TO_IMAGE
gs://YOUR_IMAGE_BUCKET_NAME
where
PATH_TO_IMAGE
is a path to an image file (that contains text) on your local system.YOUR_IMAGE_BUCKET_NAME
is the name of the bucket where you are uploading images.
You can download one of the images from the sample project.
Watch the logs to be sure the executions have completed:
gcloud functions logs read --limit 100
You can view the saved translations in the Cloud Storage bucket you used for
YOUR_RESULT_BUCKET_NAME
.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the function
Deleting Cloud Run functions does not remove any resources stored in Cloud Storage.
To delete the Cloud Run functions you created in this tutorial, run the following commands:
gcloud functions delete ocr-extract gcloud functions delete ocr-translate gcloud functions delete ocr-save
You can also delete Cloud Run functions from the Google Cloud console.