Everyday AI: beyond spell check, how Google Docs is smart enough to correct grammar
Written communication is at the heart of what drives businesses. Proposals, presentations, emails to colleagues—this all keeps work moving forward. This is why we’ve built features into G Suite to help you communicate effectively, like Smart Compose and Smart Reply, which use machine learning smarts to help you draft and respond to messages quickly. More recently, we’ve introduced machine translation techniques into Google Docs to flag grammatical errors within your documents as you draft them.
If you’ve ever questioned whether to use “a” versus “an” in a sentence, or if you’re using the correct verb tense or preposition, you’re not alone. Grammar is nuanced and tricky, which makes it a great problem to solve with the help of artificial intelligence. Here’s a look at how we built grammar suggestions in Docs.
The gray areas of grammar
Although we generally think of grammar as a set of rules, these rules are often complex and subjective. In spelling, you can reference a resource that tells you whether a word exists or how it’s spelled: dictionaries (Remember those?).
Grammar is different. It’s a harder problem to tackle because its rules aren’t fixed. It varies based on language and context, and may change over time, too. To make things more complicated, there are many different style books—whether it be MLA, AP or some other style—which makes consistency a challenge.
Given these nuances, even the experts don’t always agree on what’s correct. For our grammar suggestions, we worked with professional linguists to proofread sample sentences to get a sense of the true subjectivity of grammar. During that process, we found that linguists disagreed on grammar about 25 percent of the time. This raised the obvious question: how do we automate something that doesn’t run on definitive rules?
Where machine translation makes a mark
Much like having someone red-line your document with suggestions on how to replace “incorrect” grammar with “correct” grammar, we can use machine translation technology to help automate that process. At a basic level, machine translation performs substitution and reorders words from a source language to a target language, for example, substituting a “source” word in English (“hello!”) for a “target” word in Spanish (¡hola!). Machine translation techniques have been developed and refined over the last two decades throughout the industry, in academia and at Google, and have even helped power Google Translate.
Along similar lines, we use machine translation techniques to flag “incorrect” grammar within Docs using blue underlines, but instead of translating from one language to another like with Google Translate, we treat text with incorrect grammar as the “source” language and correct grammar as the “target.”
Working with the experts
Before we could train models, we needed to define “correct” and "incorrect” grammar. What better way to do so than to consult the experts? Our engineers worked with a collection of computational and analytical linguists, with specialties ranging from sociology to machine learning. This group supports a host of linguistic projects at Google and helps bridge the gap between how humans and machines process language (and not just in English—they support over 40 languages and counting).
For several months, these linguists reviewed thousands of grammar samples to help us refine machine translation models, from classic cases like “there” versus “their” versus “they’re,” to more complex rules involving prepositions and verb tenses. Each sample received close attention—three linguists reviewed each case to identify common patterns and make corrections. The third linguist served as the “tie-breaker” in case of disagreement (which happened a quarter of the time).
Once we identified the samples, we then fed them into statistical learning algorithms—along with “correct” text gathered from high-quality web sources (billions of words!)—to help us predict outcomes using stats like the frequency at which we’ve seen a specific correction occur. This process helped us build a basic spelling and grammar correction model.
We iterated over these models by rolling them out to a small portion of people who use Docs, and then refined them based on user feedback and interactions. For example, in earlier models of grammar suggestions, we received feedback that suggestions for verb tenses and the correct singular or plural form of a noun or verb were inaccurate. We’ve since adjusted the model to solve for these specific issues, resulting in more precise suggestions. And we are also committed to actively researching unintended bias and ways we can avoid them in our products. As language understanding models use billions of common phrases and sentences to automatically learn about the world, they can also reflect human cognitive biases. Being aware of this is a good start, and we have ongoing work underway on how to address.
Better grammar. No ifs, ands or buts.
So if you’ve ever asked yourself “how does it know what to suggest when I write in Google Docs?” these grammar suggestion models are the answer. They’re working in the background to analyze your sentence structure, and the semantics of your sentence, to help you find mistakes or inconsistencies. With the help of machine translation, here are some mistakes that Docs can help you catch:
Evolving grammar suggestions, just like language
When it comes to grammar, we’re constantly improving the quality of each suggestion to make corrections as useful and relevant as possible. With our AI-first approach, G Suite is in the best position to help you communicate smarter and faster, without sweating the small stuff. Learn more.