Spam does not bring us joy—ridding Gmail of 100 million more spam messages with TensorFlow
1.5 billion people use Gmail every month, and 5 million paying businesses use Gmail in the workplace as a part of G Suite. For consumers and businesses alike, a big part of Gmail’s draw is its built-in security protections.
Good security means constantly staying ahead of threats, and our existing ML models are highly effective at doing this—in conjunction with our other protections, they help block more than 99.9 percent of spam, phishing, and malware from reaching Gmail inboxes. Just as we evolve our security protections, we also look to advance our machine learning capabilities to protect you even better.
That’s why we recently implemented new protections powered by TensorFlow, an open-source machine learning (ML) framework developed at Google. These new protections complement existing ML and rules-based protections, and they’ve successfully improved our detection capabilities. With TensorFlow, we are now blocking around 100 million additional spam messages every day.
Where did we find these 100 million extra spam messages? We’re now blocking spam categories that used to be very hard to detect. Using TensorFlow has helped us block image-based messages, emails with hidden embedded content, and messages from newly created domains that try to hide a low volume of spammy messages within legitimate traffic.
Given we’re already blocking the majority of spammy emails in Gmail, blocking millions more with precision is a feat. TensorFlow helps us catch the spammers who slip through that less than 0.1 percent, without accidentally blocking messages that are important to users.
One person’s spam is another person’s treasure
ML makes catching spam possible by helping us identify patterns in large data sets that humans who create the rules might not catch; it makes it easy for us to adapt quickly to ever-changing spam attempts.
ML-based protections help us make granular decisions based on many different factors. Consider that every email has thousands of potential signals. Just because some of an email’s characteristics match up to those commonly considered “spammy,” doesn’t necessarily mean it’s spam. ML allows us to look at all of these signals together to make a determination.
Finally, it also helps us personalize our spam protections to each user—what one person considers spam another person might consider an important message (think newsletter subscriptions or regular email notifications from an application).
Using TensorFlow to power ML
By complementing our existing ML models with TensorFlow, we’re able to refine these models even further, while allowing the team to focus less on the underlying ML framework, and more on solving the problem: ridding your inbox of spam!
Applying ML at scale can be complex and time consuming. TensorFlow includes many tools that make the ML process easier and more efficient, accelerating the speed at which we can iterate. As an example, TensorBoard allows us to both comprehensively monitor our model training pipelines and quickly evaluate new models to determine how useful we expect them to be.
TensorFlow also gives us the flexibility to easily train and experiment with different models in parallel to develop the most effective approach, instead of running one experiment at a time.
As an open standard, TensorFlow is used by teams and researchers all over the world (There have been 71,000 forks of the public code and other open-source contributions!). This strong community support means new research and ideas can be applied quickly. And, it means we can collaborate with other teams within Google more quickly and easily to best protect our users.
All in all, these benefits allow us to scale our ML efforts, requiring fewer engineers to run more experiments and protect users more effectively.
This is just one example of how we’re using machine learning to keep users and businesses safe, and just one application of TensorFlow. Even within Gmail, we’re currently experimenting with TensorFlow in other security-related areas, such as phishing and malware detection, as part of our continuous efforts to keep users safe.
And you can use it, too. Google open-sourced TensorFlow in 2015 to make ML accessible for everyone—so that many different organizations can take advantage of the technology that powers critical capabilities like spam prevention in Gmail and more.