Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

How RealtimeCRM built a business card reader using machine learning

Friday, June 22, 2018

By Ben Miles, Head of Product, RealtimeCRM

Lately, we at RealtimeCRM have attended a lot of networking events, and invariably we take home a small mound of business cards. Due to our own procrastination, the mound grows bigger and bigger until it consumes a sizeable chunk of desk space, at which point we finally take it on. The problem we face is that manually reading and entering the information on the cards is among the least interesting tasks imaginable, and it takes a really, really long time. Then we thought, what if we could just take a picture of a business card and then all the useful contact information we needed was automatically added into RealtimeCRM?

That’s how we came about creating our Card Reader. We thought it would be a great feature that our users would appreciate, too. In this post, we’ll take you through what we did in general terms, then for those of you who want to try this yourself, we’ll show you how we did it with an  actual example script.

The problem

Up to this point, we needed a human to do the data entry. This is because a human can do two things that RealtimeCRM can’t: 

  1. They can visually process the information.
  2. They can provide context for that information.

Therefore, they know where to enter the information to create a new contact record.

The solution

Google Cloud offers a range of tools to inject machine learning into your products to improve them. For us, there were two in particular that were of interest in solving the problem:

The Google Cloud Vision API deals with the “visual processing” problem as it enables RealtimeCRM to understand the content of an image, including objects such as logos, products, and most importantly to us, Optical Character Recognition (OCR)—the ability to extract text from an image.

So we’ve got the seeing part solved. We need to be able to “provide context” for the information we’ve pulled out. That’s where Cloud Natural Language comes into play. It can be used to recognise entities such as a person, location, or organisation—as well as the overall sentiment of a given block of text.

As you can see, Cloud Natural Language was able to recognise that “Google” is an organisation and that “Mountain View” is a location. So now we’re able to provide a context for the text we previously extracted using the Vision API, which is the missing piece in this puzzle.

What we need to do next is combine the Vision API and the Natural Language API so that we can do something useful: create new contact records in RealtimeCRM from business cards, just as the above flow chart demonstrates.

We know that Joe Example is a text string and that it is a name, so it should go into the name field in the contact record inside RealtimeCRM, and the other information on the business card naturally follows the same flow.

How we did it

RealtimeCRM is built in Javascript using NodeJS and React (via MeteorJS), so what we’ll provide here is a simple example of how this works that you can adapt into any NodeJS project.


First off, you’ll need a Google API key. If you don’t already have one, you can find instructions on how to create one here.

Step 1: Display an input

Let’s start with a simple image input. We want something that allows people to take images using their camera if they are on mobile, or select a file if they are on desktop. Thankfully, you can do this all with a simple HTML input.

type="file" tells us that the input should be a file selector, accept="image/*" tells us it should be an image file and capture="environment" tells us that on mobiles that have two cameras (front and back) that it should use the back camera.

Step 2: Read the image file

There are many ways of reading a file from an input in JavaScript, depending on which libraries you are using, but we chose the standard JavaScript FileReader. We want to read the file and send it off for processing as soon as the user has selected an image/taken a picture so we do the following on the `onChange` event of the input. We also want the function to be asynchronous, so we use the async/await functionality.

While developing, we realized that high quality images were taking a fair amount of time (up to 10/20 seconds) to process via Google Cloud APIs, so we decided we could reduce the quality of the images before we sent them to the API in order to reduce time and still get a good reading. For this we used the npm package browser-image-compression. To install this, use the following command:

npm i browser-image-compression
import imageCompression from 'browser-image-compression';

async function(event) {

 // Get the file from the JS event
 const file =[0];

 // We can safely reduce the quality of images to reduce load time
 const maxSize = 0.5;
 const maxWidthOrHeight = 640;
 const compressedFile = await imageCompression(file, maxSize, maxWidthOrHeight);

 const reader = new FileReader();
 reader.onload = async readerEvent => {
     // We read the image as a data URL to send to Google cloud so strip out the part we need
     const imgString = _.chain(readerEvent)

     try {
       // Call your server method that will interpret the image using APIs
       const cardData = await SERVER_METHOD();

       // Display cardData in some way in your UI
     } catch (err) {
       // Handle errors
   // Call the function we just defined above

Step 3: Interpreting the image

This is where the magic happens. Now we have our image as a compressed data, we can send its URL off to the Google API for processing. There are 3 steps to this: first, the Cloud Vision, which reads the text on the image, Cloud Natural Language to determine the context of the text, and our own special blend of regex and methods to pull out the information we need from the text given to us by the Vision API.

Thankfully, Google have provided a handy `npm` package that wraps a lot of the boilerplate up for us. You’ll want to install both the Cloud Vision and the Cloud Natural Language client libraries.

npm i @google-cloud/language @google-cloud/vision

Again, we need this to be an asynchronous function because we need to wait for the responses of the APIs before we try to do anything with the output responses.

Here are the regex functions we used to pull out information that has a good specified format (email, postcode, phone numbers and domain names). We remove anything we find from the string so that we can give a much clearer string to the Natural Language API:

* Parse matches from string and remove any line containing a match
* @param {string} str The string to parse
* @returns {Object} result
* @returns {string[]} result.matches
* @returns {string} result.cleanedText
const removeByRegex = (str, regex) => {
 const matches = [];
 const cleanedText = str
   .filter(line => {
     const hits = line.match(regex);
     if (hits != null) {
       return false;
     return true;
 return { matches, cleanedText };

// from
const emailRegex = /(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))/;

export const removeEmails = str => {
 const { matches, cleanedText } = removeByRegex(str, emailRegex);
 return { emails: matches, stringWithoutEmails: cleanedText };

// Regex taken from
const postcodeRegex = /([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))\s?[0-9][A-Za-z]{2})/i;

export const removePostcodes = str => {
 const { matches, cleanedText } = removeByRegex(str, postcodeRegex);
 return { postcodes: => s.toUpperCase()), stringWithoutPostcodes: cleanedText };

// from
const UKphoneRegex = /((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?/;

export const removePhonenumbers = str => {
 const { matches, cleanedText } = removeByRegex(str, UKphoneRegex);
 return { phonenumbers: matches, stringWithoutPhonenumbers: cleanedText };

// from
const domainRegex = /[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})/;

export const removeDomains = str => {
 const { matches, cleanedText } = removeByRegex(str, domainRegex);
 return { domains: matches, stringWithoutDomains: cleanedText };

Here is the function that does the actual parsing of the image:

import vision from '@google-cloud/vision';
import language from '@google-cloud/language';
import {
} from '/path/to/functions.js';

// Import your Google Cloud Keyfile
import GOOGLE_CLOUD_KEYFILE = '/path/to/keyfile.json'

async function(file) {
 // Send image to Image API for OCR
 const visionClient = new vision.ImageAnnotatorClient({

 let visionResults;
 try {
   const buf = Buffer.from(file, 'base64');
   visionResults = await visionClient.textDetection(buf);
 } catch (err) {
   // Throw error

 // Text will be a string of all the text detected in the image.
 // E.g "John Smith\n Test Company\n 01234567890\n 1 Random Place\n P0S TC0DE\n\n\n" (Note the newlines)
 let { text } = visionResults[0].fullTextAnnotation;

 // Take a copy of the original text to reference later
 const originalText = _.cloneDeep(text);

 // Remove postcodes
 const { stringWithoutPostcodes } = removePostcodes(text);
 text = stringWithoutPostcodes;

 // Remove phonenumbers
 const { phonenumbers, stringWithoutPhonenumbers } = removePhonenumbers(text);
 text = stringWithoutPhonenumbers;

 // Remove detected emails
 const { emails, stringWithoutEmails } = removeEmails(text);
 text = stringWithoutEmails;

 // Remove detected domains
 const { stringWithoutDomains } = removeDomains(text);
 text = stringWithoutDomains;

 // Clean text and send to natural language API
 const cleanedText = _.replace(_.cloneDeep(text), /\r?\n|\r/g, ' ');

 const languageClient = new language.LanguageServiceClient({

 let languageResults;
 try {
   languageResults = await languageClient.analyzeEntities({
     document: {
       content: cleanedText,
       type: 'PLAIN_TEXT',
 } catch (err) {
   // Throw an error

 // Go through detected entities
 const { entities } = languageResults[0];
 const requiredEntities = { ORGANIZATION: '', PERSON: '', LOCATION: '' };
 _.each(entities, entity => {
   const { type } = entity;
   if (_.has(requiredEntities, type)) {
     requiredEntities[type] += ` ${}`;

 // return your data

Step 4: The rest is up to you!

With the difficult part behind you, you should have a concise object with all the information you need from an image! You can now put that data somewhere on your page by returning the values from the server method. We also recommend showing a loading icon or explanatory text while the image is being processed as the Google APIs can sometimes take a few seconds to return with a value.


We introduced our business card reader without much fanfare, but the response has been really positive. Our users really like it, and it adds a little bit of magic to our live demos. If you’re interested in learning more about RealtimeCRM, visit our website.

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.