Data Analytics
How RealtimeCRM built a business card reader using machine learning
Lately, we at RealtimeCRM have attended a lot of networking events, and invariably we take home a small mound of business cards. Due to our own procrastination, the mound grows bigger and bigger until it consumes a sizeable chunk of desk space, at which point we finally take it on. The problem we face is that manually reading and entering the information on the cards is among the least interesting tasks imaginable, and it takes a really, really long time. Then we thought, what if we could just take a picture of a business card and then all the useful contact information we needed was automatically added into RealtimeCRM?
That’s how we came about creating our Card Reader. We thought it would be a great feature that our users would appreciate, too. In this post, we’ll take you through what we did in general terms, then for those of you who want to try this yourself, we’ll show you how we did it with an actual example script.
The problem
Up to this point, we needed a human to do the data entry. This is because a human can do two things that RealtimeCRM can’t:
- They can visually process the information.
- They can provide context for that information.
Therefore, they know where to enter the information to create a new contact record.
The solution
Google Cloud offers a range of tools to inject machine learning into your products to improve them. For us, there were two in particular that were of interest in solving the problem:So we’ve got the seeing part solved. We need to be able to “provide context” for the information we’ve pulled out. That’s where Cloud Natural Language comes into play. It can be used to recognise entities such as a person, location, or organisation—as well as the overall sentiment of a given block of text.
As you can see, Cloud Natural Language was able to recognise that “Google” is an organisation and that “Mountain View” is a location. So now we’re able to provide a context for the text we previously extracted using the Vision API, which is the missing piece in this puzzle.
What we need to do next is combine the Vision API and the Natural Language API so that we can do something useful: create new contact records in RealtimeCRM from business cards, just as the above flow chart demonstrates.
We know that Joe Example is a text string and that it is a name, so it should go into the name field in the contact record inside RealtimeCRM, and the other information on the business card naturally follows the same flow.
How we did it
RealtimeCRM is built in Javascript using NodeJS and React (via MeteorJS), so what we’ll provide here is a simple example of how this works that you can adapt into any NodeJS project.Preparation
First off, you’ll need a Google API key. If you don’t already have one, you can find instructions on how to create one here.
Step 1: Display an input
Let’s start with a simple image input. We want something that allows people to take images using their camera if they are on mobile, or select a file if they are on desktop. Thankfully, you can do this all with a simple HTML input.type="file" tells us that the input should be a file selector, accept="image/*"
tells us it should be an image file and capture="environment"
tells us that on mobiles that have two cameras (front and back) that it should use the back camera.
Step 2: Read the image file
There are many ways of reading a file from an input in JavaScript, depending on which libraries you are using, but we chose the standard JavaScript FileReader. We want to read the file and send it off for processing as soon as the user has selected an image/taken a picture so we do the following on the `onChange` event of the input. We also want the function to be asynchronous, so we use the async/await functionality.While developing, we realized that high quality images were taking a fair amount of time (up to 10/20 seconds) to process via Google Cloud APIs, so we decided we could reduce the quality of the images before we sent them to the API in order to reduce time and still get a good reading. For this we used the
npm
package browser-image-compression
. To install this, use the following command: npm i browser-image-compression
import imageCompression from 'browser-image-compression';
async function(event) {
// Get the file from the JS event
const file = event.target.files[0];
// We can safely reduce the quality of images to reduce load time
const maxSize = 0.5;
const maxWidthOrHeight = 640;
const compressedFile = await imageCompression(file, maxSize, maxWidthOrHeight);
const reader = new FileReader();
reader.onload = async readerEvent => {
// We read the image as a data URL to send to Google cloud so strip out the part we need
const imgString = _.chain(readerEvent)
.get('target.result')
.split(',')
.last()
.value();
try {
// Call your server method that will interpret the image using APIs
const cardData = await SERVER_METHOD();
// Display cardData in some way in your UI
console.log(cardData);
} catch (err) {
// Handle errors
console.log(err);
}
};
// Call the function we just defined above
reader.readAsDataURL(compressedFile);
}
Step 3: Interpreting the image
This is where the magic happens. Now we have our image as a compressed data, we can send its URL off to the Google API for processing. There are 3 steps to this: first, the Cloud Vision, which reads the text on the image, Cloud Natural Language to determine the context of the text, and our own special blend of regex and methods to pull out the information we need from the text given to us by the Vision API.Thankfully, Google have provided a handy `npm` package that wraps a lot of the boilerplate up for us. You’ll want to install both the Cloud Vision and the Cloud Natural Language client libraries.
npm i @google-cloud/language @google-cloud/vision
Again, we need this to be an asynchronous function because we need to wait for the responses of the APIs before we try to do anything with the output responses.
Here are the regex functions we used to pull out information that has a good specified format (email, postcode, phone numbers and domain names). We remove anything we find from the string so that we can give a much clearer string to the Natural Language API:
/**
* Parse matches from string and remove any line containing a match
* @param {string} str The string to parse
* @returns {Object} result
* @returns {string[]} result.matches
* @returns {string} result.cleanedText
*/
const removeByRegex = (str, regex) => {
const matches = [];
const cleanedText = str
.split('\n')
.filter(line => {
const hits = line.match(regex);
if (hits != null) {
matches.push(hits[0]);
return false;
}
return true;
})
.join('\n');
return { matches, cleanedText };
};
// from http://emailregex.com
const emailRegex = /(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))/;
export const removeEmails = str => {
const { matches, cleanedText } = removeByRegex(str, emailRegex);
return { emails: matches, stringWithoutEmails: cleanedText };
};
// Regex taken from https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
const postcodeRegex = /([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))\s?[0-9][A-Za-z]{2})/i;
export const removePostcodes = str => {
const { matches, cleanedText } = removeByRegex(str, postcodeRegex);
return { postcodes: matches.map(s => s.toUpperCase()), stringWithoutPostcodes: cleanedText };
};
// from http://www.regexlib.com/REDetails.aspx?regexp_id=589
const UKphoneRegex = /((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?/;
export const removePhonenumbers = str => {
const { matches, cleanedText } = removeByRegex(str, UKphoneRegex);
return { phonenumbers: matches, stringWithoutPhonenumbers: cleanedText };
};
// from https://stackoverflow.com/a/20046959
const domainRegex = /[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})/;
export const removeDomains = str => {
const { matches, cleanedText } = removeByRegex(str, domainRegex);
return { domains: matches, stringWithoutDomains: cleanedText };
};
Here is the function that does the actual parsing of the image:
import vision from '@google-cloud/vision';
import language from '@google-cloud/language';
import {
removeDomains,
removePhonenumbers,
removePostcodes,
removeEmails,
} from '/path/to/functions.js';
// Import your Google Cloud Keyfile
import GOOGLE_CLOUD_KEYFILE = '/path/to/keyfile.json'
async function(file) {
// Send image to Image API for OCR
const visionClient = new vision.ImageAnnotatorClient({
keyFilename: GOOGLE_CLOUD_KEYFILE,
});
let visionResults;
try {
const buf = Buffer.from(file, 'base64');
visionResults = await visionClient.textDetection(buf);
} catch (err) {
// Throw error
console.log(err);
}
// Text will be a string of all the text detected in the image.
// E.g "John Smith\n Test Company\n 01234567890\n 1 Random Place\n P0S TC0DE\n www.johnsmith.com\n john.smith@testcompany.com\n" (Note the newlines)
let { text } = visionResults[0].fullTextAnnotation;
// Take a copy of the original text to reference later
const originalText = _.cloneDeep(text);
// Remove postcodes
const { stringWithoutPostcodes } = removePostcodes(text);
text = stringWithoutPostcodes;
// Remove phonenumbers
const { phonenumbers, stringWithoutPhonenumbers } = removePhonenumbers(text);
text = stringWithoutPhonenumbers;
// Remove detected emails
const { emails, stringWithoutEmails } = removeEmails(text);
text = stringWithoutEmails;
// Remove detected domains
const { stringWithoutDomains } = removeDomains(text);
text = stringWithoutDomains;
// Clean text and send to natural language API
const cleanedText = _.replace(_.cloneDeep(text), /\r?\n|\r/g, ' ');
const languageClient = new language.LanguageServiceClient({
keyFilename: GOOGLE_CLOUD_KEYFILE,
});
let languageResults;
try {
languageResults = await languageClient.analyzeEntities({
document: {
content: cleanedText,
type: 'PLAIN_TEXT',
},
});
} catch (err) {
// Throw an error
}
// Go through detected entities
const { entities } = languageResults[0];
const requiredEntities = { ORGANIZATION: '', PERSON: '', LOCATION: '' };
_.each(entities, entity => {
const { type } = entity;
if (_.has(requiredEntities, type)) {
requiredEntities[type] += ` ${entity.name}`;
}
});
// return your data
}