How Google Does It: Finding, tracking, and fixing vulnerabilities
Ana Oprea
Manager, Product Security Engineering
Anton Chuvakin
Security Advisor, Office of the CISO
Hear monthly from our Cloud CISO in your inbox
Get the latest on security from Cloud CISO Phil Venables.
SubscribeEver wondered how Google does security? As part of our new “How Google Does It” series, we’ll share insights, observations, and top tips about how Google approaches some of today's most pressing security topics, challenges, and concerns — straight from Google experts. In this edition, Ana Oprea, manager, product security engineering, and previously the European lead for Google’s Vulnerability Coordination Center, shares some of the core practices behind Google’s vulnerability management program.
Security vulnerabilities are a given — no matter how carefully you design and implement IT systems. As soon as you find and patch one, another one inevitably pops up. The challenge is less about eliminating all vulnerabilities entirely, and instead more about effectively managing and remediating them.
At Google, we believe understanding our vulnerabilities and ensuring they are addressed across the company is the key to keeping our customers and users safe. As with many organizations, the cornerstones of our approach are identifying vulnerabilities, prioritizing them, and implementing processes and technologies to resolve them.
We coordinate resolutions and ensure our customers understand the steps we take to fix issues. We also engage in many proactive activities, such as performing regular data analysis, so we can share valuable insights about new trends and classes of vulnerabilities to help engineering teams reduce risk.
Let’s take a closer look at some of the key aspects of our approach that have helped us build a successful vulnerability management program.
Impact assessment
Given the scope and scale of our work at Google, one of the biggest challenges we face is making decisions about what vulnerabilities to prioritize and how quickly we need to react. To do this, we use an impact assessment process to help us identify critical vulnerabilities, the systems that are in scope, and the relative risk level for each of those systems.
In general, there is no single dimension that we use for prioritization. However, we tend to start with looking at the severity of a vulnerability and understanding whether or not it impacts our systems. Thousands of vulnerabilities are discovered every day and only a small subset of them will matter to your company or team. For instance, Patch Tuesday or patching a TCP/IP remote code execution (RCE) vulnerability on Windows isn’t necessarily going to be serious for an organization whose fleet runs only on macOS.
Google may be a big ship, but we can move fast when needed. In part, this is because we have invested in establishing cross-company processes for vulnerability management.
Once we understand whether a vulnerability is critical, we will start looking at other dimensions to help us decide where to start, including the criticality of an asset, any deadlines related to embargoes, the timelines in our Service Level Agreements (SLAs), and available resources.
Accelerated remediation
Google may be a big ship, but we can move fast when needed. In part, this is because we have invested in establishing cross-company processes for vulnerability management. If something does happen to go wrong, we have explicit plans to help us figure out exactly what to do.
Our products support billions of users, so even a vulnerability that affects just one percent of our users can be a huge problem. We have developed many different procedures that define how to deal with a wide range of different vulnerability categories, how to handle embargoes, and the particular restrictions around deadlines and information we can share.
Another crucial aspect of being able to respond quickly to vulnerabilities is that we are able to accelerate our normal timelines when rolling out a patch. In terms of remediation, we typically start by verifying that we have a patch that addresses a specific vulnerability and then push it out using the same deployment framework we use for regular releases.
Here, it’s crucial to take speed into account when you’re designing your build pipelines, so that you have the capabilities that allow you to expedite a release to meet a shorter timeline if needed. For instance, our deployment framework has been adapted to meet our vulnerability management timelines, enabling us to test on a small percentage of our user base first, make adjustments, and then release accordingly once we’re confident that a patch is successful.
Connections and communication
We believe that building strong connections and communication channels — both in and outside of Google — is foundational for effective vulnerability management.
We work with our leadership to ensure that all remediations and coordinations have appropriate resources set. We collaborate closely with our Communications and Support teams to help us convey remediation plans and engage with the many groups in the company that deal with vulnerabilities, such as the Google Cloud Vulnerability Reward Program — one of our bug bounty programs that offers incentives for hunting down vulnerabilities in Google software.
Externally, we inform and respond to regulators who might have inquiries and maintain close relationships with industry partners and other members of the security research community.
Ideally, these partnerships should be cultivated and built up before you actually need them. The first time you speak with key stakeholders shouldn’t be when you’re in the midst of a crisis or response effort. We do a lot of pre-action bridge building to ensure that everyone we work with clearly understands their roles and responsibilities, so we can respond consistently to issues.
Ultimately, you can’t stop attackers from trying to exploit vulnerabilities, so it’s imperative that you also try to anticipate as many of an attacker’s potential actions as possible.
For instance, we use responsibility assignment matrices, known as RACI models, to help us chart out roles and responsibilities. We also find it useful to run role-playing scenario exercises to help us simulate situations and test our response techniques.
Threat detection and intelligence
Ultimately, you can’t stop attackers from trying to exploit vulnerabilities, so it’s imperative that you also try to anticipate as many of an attacker’s potential actions as possible. Threat detection plays a central role in our vulnerability management program, helping us to stay ahead of the latest techniques and put measures in place to mitigate risk, especially if a vulnerability can’t be patched.
Whenever we notice activity or reconnaissance for certain classes of attacks, we try to add signals. Similar to vulnerabilities, you’ll also need to take into account prioritization, potential system impact, and other factors specific to your organization. It’s not possible to write and maintain detections for everything; if your system doesn’t contain an asset affected by a particular vulnerability, there’s no point in adding a signal for it.
The way you gather threat intelligence will likely come down to available resources, your industry or sector, and the maturity of your security program. For instance, not everyone has the ability to build a dedicated threat intelligence team, and organizations may have varying degrees of access to data sources, such as open-source intelligence, commercial feeds, or internal data.
Whatever your organization’s individual situation, it’s useful to consider ways you can study or passively monitor known or likely threats that may impact your systems and infrastructure.
This article includes insight from the Cloud Security Podcast episode, “How Google Does Vulnerability Management: The Not So Secret Secrets”.