Jump to Content
Data Analytics

How RealtimeCRM built a business card reader using machine learning

June 22, 2018
Ben Miles

RealtimeCRM

Lately, we at RealtimeCRM have attended a lot of networking events, and invariably we take home a small mound of business cards. Due to our own procrastination, the mound grows bigger and bigger until it consumes a sizeable chunk of desk space, at which point we finally take it on. The problem we face is that manually reading and entering the information on the cards is among the least interesting tasks imaginable, and it takes a really, really long time. Then we thought, what if we could just take a picture of a business card and then all the useful contact information we needed was automatically added into RealtimeCRM?

That’s how we came about creating our Card Reader. We thought it would be a great feature that our users would appreciate, too. In this post, we’ll take you through what we did in general terms, then for those of you who want to try this yourself, we’ll show you how we did it with an  actual example script.

The problem

Up to this point, we needed a human to do the data entry. This is because a human can do two things that RealtimeCRM can’t:

  1. They can visually process the information.
  2. They can provide context for that information.

Therefore, they know where to enter the information to create a new contact record.

The solution

Google Cloud offers a range of tools to inject machine learning into your products to improve them. For us, there were two in particular that were of interest in solving the problem:

The Google Cloud Vision API deals with the “visual processing” problem as it enables RealtimeCRM to understand the content of an image, including objects such as logos, products, and most importantly to us, Optical Character Recognition (OCR)—the ability to extract text from an image.

So we’ve got the seeing part solved. We need to be able to “provide context” for the information we’ve pulled out. That’s where Cloud Natural Language comes into play. It can be used to recognise entities such as a person, location, or organisation—as well as the overall sentiment of a given block of text.

https://storage.googleapis.com/gweb-cloudblog-publish/images/realtimecrm-1f2u4.max-800x800.PNG

As you can see, Cloud Natural Language was able to recognise that “Google” is an organisation and that “Mountain View” is a location. So now we’re able to provide a context for the text we previously extracted using the Vision API, which is the missing piece in this puzzle.

https://storage.googleapis.com/gweb-cloudblog-publish/images/realtimecrm-2l1q5.max-700x700.PNG

What we need to do next is combine the Vision API and the Natural Language API so that we can do something useful: create new contact records in RealtimeCRM from business cards, just as the above flow chart demonstrates.

We know that Joe Example is a text string and that it is a name, so it should go into the name field in the contact record inside RealtimeCRM, and the other information on the business card naturally follows the same flow.

How we did it

RealtimeCRM is built in Javascript using NodeJS and React (via MeteorJS), so what we’ll provide here is a simple example of how this works that you can adapt into any NodeJS project.

Preparation

First off, you’ll need a Google API key. If you don’t already have one, you can find instructions on how to create one here.

Step 1: Display an input

Let’s start with a simple image input. We want something that allows people to take images using their camera if they are on mobile, or select a file if they are on desktop. Thankfully, you can do this all with a simple HTML input.

type="file" tells us that the input should be a file selector, accept="image/*" tells us it should be an image file and capture="environment" tells us that on mobiles that have two cameras (front and back) that it should use the back camera.

Step 2: Read the image file

There are many ways of reading a file from an input in JavaScript, depending on which libraries you are using, but we chose the standard JavaScript FileReader. We want to read the file and send it off for processing as soon as the user has selected an image/taken a picture so we do the following on the `onChange` event of the input. We also want the function to be asynchronous, so we use the async/await functionality.

While developing, we realized that high quality images were taking a fair amount of time (up to 10/20 seconds) to process via Google Cloud APIs, so we decided we could reduce the quality of the images before we sent them to the API in order to reduce time and still get a good reading. For this we used the npm package browser-image-compression. To install this, use the following command:

Loading...

Loading...

Step 3: Interpreting the image

This is where the magic happens. Now we have our image as a compressed data, we can send its URL off to the Google API for processing. There are 3 steps to this: first, the Cloud Vision, which reads the text on the image, Cloud Natural Language to determine the context of the text, and our own special blend of regex and methods to pull out the information we need from the text given to us by the Vision API.

Thankfully, Google have provided a handy `npm` package that wraps a lot of the boilerplate up for us. You’ll want to install both the Cloud Vision and the Cloud Natural Language client libraries.


Loading...

Again, we need this to be an asynchronous function because we need to wait for the responses of the APIs before we try to do anything with the output responses.

Here are the regex functions we used to pull out information that has a good specified format (email, postcode, phone numbers and domain names). We remove anything we find from the string so that we can give a much clearer string to the Natural Language API:

Loading...

Here is the function that does the actual parsing of the image:

Loading...

Step 4: The rest is up to you!

With the difficult part behind you, you should have a concise object with all the information you need from an image! You can now put that data somewhere on your page by returning the values from the server method. We also recommend showing a loading icon or explanatory text while the image is being processed as the Google APIs can sometimes take a few seconds to return with a value.

Conclusion

We introduced our business card reader without much fanfare, but the response has been really positive. Our users really like it, and it adds a little bit of magic to our live demos. If you’re interested in learning more about RealtimeCRM, visit our website.

Posted in