Optical Character Recognition (OCR) Tutorial

Learn how to perform optical character recognition (OCR) on Google Cloud Platform. This tutorial demonstrates how to upload image files to Google Cloud Storage, extract text from the images using the Google Cloud Vision API, translate the text using the Google Cloud Translation API, and save your translations back to Cloud Storage. Google Cloud Pub/Sub is used to queue various tasks and trigger the right Cloud Functions to carry them out.

Objectives

  • Write and deploy several Background Cloud Functions.
  • Upload images to Cloud Storage.
  • Extract, translate and save text contained in uploaded images.

Costs

This tutorial uses billable components of Cloud Platform, including:

  • Google Cloud Functions
  • Google Cloud Pub/Sub
  • Google Cloud Storage
  • Google Cloud Translation API
  • Google Cloud Vision API

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. Enable the Cloud Functions, Cloud Pub/Sub, Cloud Storage, Cloud Translate, and Cloud Vision APIs.

    Enable the APIs

  5. Install and initialize the Cloud SDK.
  6. Update and install gcloud components:
    gcloud components update &&
    gcloud components install beta

Visualizing the flow of data

The flow of data in the OCR tutorial application involves several steps:

  1. An image is uploaded to Cloud Storage with text in any language (text that appears in the image itself).
  2. A Cloud Function is triggered, which uses the Vision API to extract the text, and queues the text to be translated into the configured translation languages.
  3. For each queued translation, a Cloud Function is triggered which uses the Translation API to translate the text and queue it to be saved to Cloud Storage.
  4. For each translated text, a Cloud Function is triggered which saves the translated text to Cloud Storage.

It may help to visualize the steps:

Preparing the application

  1. Create a Cloud Storage bucket to stage your Cloud Functions files, where [YOUR_STAGING_BUCKET_NAME] is a globally-unique bucket name:

    gsutil mb gs://[YOUR_STAGING_BUCKET_NAME]

  2. Create a Cloud Storage bucket to upload your images, where [YOUR_IMAGE_BUCKET_NAME] is a globally-unique bucket name:

    gsutil mb gs://[YOUR_IMAGE_BUCKET_NAME]

  3. Create a Cloud Storage bucket to save the translations, where [YOUR_TEXT_BUCKET_NAME] is a globally-unique bucket name:

    gsutil mb gs://[YOUR_TEXT_BUCKET_NAME]

  4. Create a directory on your local system for the application code:

    • Linux or Mac OS X:

      mkdir ~/gcf_ocr
      cd ~/gcf_ocr
      
    • Windows

      mkdir %HOMEPATH%\gcf_ocr
      cd %HOMEPATH%\gcf_ocr
      
  5. Download both index.js and package.json files from the Cloud Functions sample project on GitHub and save it to the gcf_ocr directory.

  6. Create a config.json file in the gcf_ocr directory with the following contents:

    {
      "RESULT_TOPIC": "[YOUR_RESULT_TOPIC_NAME]",
      "RESULT_BUCKET": "[YOUR_TEXT_BUCKET_NAME]",
      "TRANSLATE_TOPIC": "[YOUR_TRANSLATE_TOPIC_NAME]",
      "TRANSLATE": true,
      "TO_LANG": ["en", "fr", "es", "ja", "ru"]
    }
    
    • Replace [YOUR_RESULT_TOPIC_NAME] with a topic name to be used for saving results.
    • Replace [YOUR_TEXT_BUCKET_NAME] with a bucket name used for saving results.
    • Replace [YOUR_TRANSLATE_TOPIC_NAME] with a topic name to be used for translating results.

Understanding the code

Importing dependencies

The application must import several dependencies in order to communicate with Google Cloud Platform services:

Node.js

const config = require('./config.json');

// Get a reference to the Pub/Sub component
const pubsub = require('@google-cloud/pubsub')();
// Get a reference to the Cloud Storage component
const storage = require('@google-cloud/storage')();
// Get a reference to the Cloud Vision API component
const vision = require('@google-cloud/vision')();
// Get a reference to the Translate API component
const translate = require('@google-cloud/translate')();

Processing images

The following function reads an uploaded image file from Cloud Storage and calls the detectText function:

Node.js

/**
 * Cloud Function triggered by Cloud Storage when a file is uploaded.
 *
 * @param {object} event The Cloud Functions event.
 * @param {object} event.data A Google Cloud Storage File object.
 */
exports.processImage = function processImage (event) {
  let file = event.data;

  return Promise.resolve()
    .then(() => {
      if (file.resourceState === 'not_exists') {
        // This was a deletion event, we don't want to process this
        return;
      }

      if (!file.bucket) {
        throw new Error('Bucket not provided. Make sure you have a "bucket" property in your request');
      }
      if (!file.name) {
        throw new Error('Filename not provided. Make sure you have a "name" property in your request');
      }

      file = storage.bucket(file.bucket).file(file.name);

      return detectText(file);
    })
    .then(() => {
      console.log(`File ${file.name} processed.`);
    });
};

The processImage function is exported by the module and is executed when a file is uploaded to the Cloud Storage bucket you created for uploading images.

The following function extracts text from the image using the Cloud Vision API and queues the text for translation:

Node.js

/**
 * Detects the text in an image using the Google Vision API.
 *
 * @param {object} file Cloud Storage File instance.
 * @returns {Promise}
 */
function detectText (file) {
  let text;

  console.log(`Looking for text in image ${file.name}`);
  return vision.detectText(file)
    .then(([_text]) => {
      if (Array.isArray(_text)) {
        text = _text[0];
      } else {
        text = _text;
      }
      console.log(`Extracted text from image (${text.length} chars)`);
      return translate.detect(text);
    })
    .then(([detection]) => {
      if (Array.isArray(detection)) {
        detection = detection[0];
      }
      console.log(`Detected language "${detection.language}" for ${file.name}`);

      // Submit a message to the bus for each language we're going to translate to
      const tasks = config.TO_LANG.map((lang) => {
        let topicName = config.TRANSLATE_TOPIC;
        if (detection.language === lang) {
          topicName = config.RESULT_TOPIC;
        }
        const messageData = {
          text: text,
          filename: file.name,
          lang: lang,
          from: detection.language
        };

        return publishResult(topicName, messageData);
      });

      return Promise.all(tasks);
    });
}

Translating text

The following function translates the extracted text and queues the translated text to be saved back to Cloud Storage:

Node.js

/**
 * Translates text using the Google Translate API. Triggered from a message on
 * a Pub/Sub topic.
 *
 * @param {object} event The Cloud Functions event.
 * @param {object} event.data The Cloud Pub/Sub Message object.
 * @param {string} event.data.data The "data" property of the Cloud Pub/Sub
 * Message. This property will be a base64-encoded string that you must decode.
 */
exports.translateText = function translateText (event) {
  const pubsubMessage = event.data;
  const jsonStr = Buffer.from(pubsubMessage.data, 'base64').toString();
  const payload = JSON.parse(jsonStr);

  return Promise.resolve()
    .then(() => {
      if (!payload.text) {
        throw new Error('Text not provided. Make sure you have a "text" property in your request');
      }
      if (!payload.filename) {
        throw new Error('Filename not provided. Make sure you have a "filename" property in your request');
      }
      if (!payload.lang) {
        throw new Error('Language not provided. Make sure you have a "lang" property in your request');
      }

      const options = {
        from: payload.from,
        to: payload.lang
      };

      console.log(`Translating text into ${payload.lang}`);
      return translate.translate(payload.text, options);
    })
    .then(([translation]) => {
      const messageData = {
        text: translation,
        filename: payload.filename,
        lang: payload.lang
      };

      return publishResult(config.RESULT_TOPIC, messageData);
    })
    .then(() => {
      console.log(`Text translated to ${payload.lang}`);
    });
};

The translateText function is exported by the module and is executed when a message is published to the Cloud Pub/Sub topic specified by the TRANSLATE_TOPIC value in the config.json file.

Saving the translations

Finally, the following function receives the translated text and saves it back to Cloud Storage:

Node.js

/**
 * Saves the data packet to a file in GCS. Triggered from a message on a Pub/Sub
 * topic.
 *
 * @param {object} event The Cloud Functions event.
 * @param {object} event.data The Cloud Pub/Sub Message object.
 * @param {string} event.data.data The "data" property of the Cloud Pub/Sub
 * Message. This property will be a base64-encoded string that you must decode.
 */
exports.saveResult = function saveResult (event) {
  const pubsubMessage = event.data;
  const jsonStr = Buffer.from(pubsubMessage.data, 'base64').toString();
  const payload = JSON.parse(jsonStr);

  return Promise.resolve()
    .then(() => {
      if (!payload.text) {
        throw new Error('Text not provided. Make sure you have a "text" property in your request');
      }
      if (!payload.filename) {
        throw new Error('Filename not provided. Make sure you have a "filename" property in your request');
      }
      if (!payload.lang) {
        throw new Error('Language not provided. Make sure you have a "lang" property in your request');
      }

      console.log(`Received request to save file ${payload.filename}`);

      const bucketName = config.RESULT_BUCKET;
      const filename = renameImageForSave(payload.filename, payload.lang);
      const file = storage.bucket(bucketName).file(filename);

      console.log(`Saving result to ${filename} in bucket ${bucketName}`);

      return file.save(payload.text);
    })
    .then(() => {
      console.log(`File saved.`);
    });
};

The saveResult function is exported by the module and is executed when a message is published to the Cloud Pub/Sub topic specified by the RESULT_TOPIC value in the config.json file.

Deploying the functions

  1. To deploy the processImage function with a Cloud Storage trigger, run the following command in the gcf_ocr directory:

    gcloud beta functions deploy ocr-extract --stage-bucket [YOUR_STAGING_BUCKET_NAME] --trigger-bucket [YOUR_IMAGE_BUCKET_NAME] --entry-point processImage
    

    where

    • [YOUR_STAGING_BUCKET_NAME] is the name of your staging Cloud Storage Bucket.
    • [YOUR_IMAGE_BUCKET_NAME] is the name of your Cloud Storage Bucket where you will be uploading images.
  2. To deploy the translateText function with a Cloud Pub/Sub trigger, run the following command in the gcf_ocr directory:

    gcloud beta functions deploy ocr-translate --stage-bucket [YOUR_STAGING_BUCKET_NAME] --trigger-topic [YOUR_TRANSLATE_TOPIC_NAME] --entry-point translateText
    

    where

    • [YOUR_STAGING_BUCKET_NAME] is the name of your staging Cloud Storage Bucket.
    • [YOUR_TRANSLATE_TOPIC_NAME] is the name of your Cloud Pub/Sub topic with which translations will be triggered.
  3. To deploy the saveResult function with a Cloud Pub/Sub trigger, run the following command in the gcf_ocr directory:

    gcloud beta functions deploy ocr-save --stage-bucket [YOUR_STAGING_BUCKET_NAME] --trigger-topic [YOUR_RESULT_TOPIC_NAME] --entry-point saveResult
    

    where

    • [YOUR_STAGING_BUCKET_NAME] is the name of your staging Cloud Storage Bucket.
    • [YOUR_RESULT_TOPIC_NAME] is the name of your Cloud Pub/Sub topic with which saving of results will be triggered.

Uploading an image

  1. Upload an image to your image Cloud Storage bucket:

    gsutil cp [PATH_TO_IMAGE] gs://[YOUR_IMAGE_BUCKET_NAME]

    where

    • [PATH_TO_IMAGE] is a path to an image file (that contains text) on your local system.
    • [YOUR_IMAGE_BUCKET_NAME] is the name of the bucket where you are uploading images.

    You can download one of the images from the sample project.

  2. Watch the logs to be sure the executions have completed:

    gcloud beta functions logs read --limit 100
    
  3. You can view the saved translations in the Cloud Storage bucket specified by the RESULT_BUCKET value in the config.json file.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Deleting the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

To delete the project:

  1. In the Cloud Platform Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting the Cloud Functions

Deleting the Cloud Functions removes all Container Engine resources, but you need to manually remove any resources in Cloud Storage, and Cloud Pub/Sub.

Delete a Cloud Function:

gcloud beta functions delete [NAME_OF_FUNCTION]

You can also delete Cloud Functions from the Google Cloud Platform Console.

Send feedback about...

Cloud Functions Documentation