How to avoid cloud misconfigurations and move towards continuous compliance
Ken Zhang
Head of Security Hong Kong, Customer Engineering, Google Cloud
Zeal Somani
Solutions Manager, Security Solutions, Google Cloud
Security is often seen as a zero-sum game between “go fast” or “stay secure.” We would like to challenge this school of thought and introduce a framework to change that paradigm to a “win-win game,” so you can do both—“go fast” and “stay secure.”
Historically, application security tools have been implemented much like a gate at a parking lot. The parking lot has perimeter-based ingress and egress boom gates. The boom gates let one car through at a time, and vehicles often are backed up at the gates during busy hours. However, there are few controls once you get inside. You can access nearly any space on any level and easily move between levels.
When you apply this analogy to application development, AppSec tools are often implemented as "toll gates" within waterfall-native workflows. Developers are required to get in line, submit to a security scan, and wait to see the results. When the results are produced, developers spend significant time and energy investigating red flags raised by security. This process is slow and, not surprisingly, not very popular with developers. It’s why they often view traditional security programs as inhibitors to innovation.
Guardrails not gates
We suggest a workflow that’s less like a parking lot gate and more like a freeway with common-sense safety measures. Freeways have directive rules for all users. Speed limits, single direction of travel, and mandatory speed reduction zones when exiting contribute to freeway safety. Some freeways implement preventative measures based on these rules, such as physical walls dividing opposite flows of traffic and protective guardrails to reduce collisions and keep vehicles from veering off the road. While driving on a freeway comes with its own complications, there are no boom-style gates blocking your path.
Following the same directive rules, there are detective and responsive controls, such as speed detectors, cameras, signs reminding drivers which direction they are going, and how fast they are traveling. Some freeways have deployed rumble strips to remind a dozing driver to stay in their lane.
Applying lessons from freeways to application development and compliance in the cloud represents the perfect opportunity to build software more securely.
Modern application security tools should be fully automated, largely invisible to developers, and minimize friction within the DevOps pipeline. To do this, these security tools should work the way developers want to work. Security controls should integrate into the development lifecycle early and everywhere. These controls should live within the developer's preferred tools and create rapid feedback loops so mistakes can be remediated as soon as possible.
A typical compliance cycle looks like this:
Here, we highlight the gap between the desired state and the actual state that becomes problematic when audit times come. This increases the overall cost of the audit and the time spent in generating the evidence of controls.
Instead, this is what we need.
We need the actual state to track the desired state continuously. We need continuous preventative controls to stop insecure resources from being introduced. We need detective controls to find non-compliant resources promptly and constantly. We need responsive controls to fix non-compliant resources automatically. In all, we need continuous compliance.
Infrastructure continuous compliance reference architecture
How do we get started with continuous compliance? Here is the reference architecture that enables you to develop this capability.
The architecture is centered on building a close-loop of directive, preventative, detective and responsive controls. It is also open and extensible. Although we reference Google Cloud architectures in this blog, you can use them for other cloud service platforms or even on-premise.
The National Institute of Standards and Technology’s Open Security Controls Assessment Language (OSCAL) is a helpful resource to express your control library in a machine-readable format. OSCAL can allow organizations to define a set of security and privacy requirements, which are represented as controls, which then can be grouped together as a control catalog. Organizations can use these catalogs to establish security and privacy control baselines through a process that may aggregate and tailor controls from multiple source catalogs. Using the OSCAL profile model to express a baseline makes the mappings between the control catalog and the profile explicit and machine-readable.
Directive controls
The starting point of the close-loop is the directive and harmonized controls. Next, you should have control mappings rationalized to the technical controls against your compliance requirements. These requirements can come from various sources, such as the threat landscape of your industry, your internal security policies and standards, your external regulatory compliance, and industry best practice frameworks.
Control mappings will form a Technical Control Library. The library is a dataset mapping out harmonized controls to requirements written in different compliance frameworks. The control mapping justifies the security controls. It builds the linkage between security and compliance and helps you reduce your compliance audit cost. This dataset should be a living document.
An easy first step in building such as library is to begin with the CIS Google Cloud Platform Foundation Benchmark. The benchmark is lightweight and it constitutes foundational security any entity should get right on Google Cloud. In addition, Security Command Center Premium’s Security Health Analytics can help you to monitor your Google Cloud environment against these benchmarks on a continuous basis across all the projects within your organization.
The Technical Control Library will guide the rest of the closed-loop. For every directive control, you should have corresponding preventative control to stop non-compliant resources from being deployed. You should have the detective control to look over the entire environment seeking non-compliant resources. And you should have the responsive control remedying non-compliant resources automatically or kicking off responsive workflow with your Security Operations function. Finally, every policy evaluation point should have a feedback loop to the engineers. A prompt and meaningful feedback loop provides a better engineering experience and increases development velocity in the short run. These feedback loops will breed good behaviors to write better and more secure code in the long run.
Preventive controls
Almost every action on the Google Cloud is an API call, such as when creating, configuring, or deleting resources, so preventative controls are all about API call constraints. There are different wrappers for these API calls, including Infrastructure-as-Code (IaC) solutions such as Terraform or Google Cloud Deployment Manager, the Cloud Console interface, Cloud Shell SDK, Python, or GO SDK.
As with any other application code deployment, the IaC solutions should use a Continuous Integration (CI) solution. On the CI, you could orchestrate IaC constraints, similar to writing unit tests for application code. Since all IaC solutions come in or can be converted to JSON format, you can use Open Policy Agent (OPA) as the IaC constraint solution. OPA’s Rego policy language is declarative and flexible, which allows you to construct almost any policy in Rego.
For the input sources that are not IaC, you could fall back to the organization policies and IAM as these two controls have the closest proximity to Google Cloud. That said, it’s considered a best practice to restrict non-IaC inputs for higher environments such as production-like or production, so you could codify your infrastructure, apply controls and workflows in the source repository.
Detective and responsive controls
Even if you've nailed the preventive controls, and the cloud environment is sterile, we still need detective and responsive controls. Here’s why.
For one, not all the controls can be safely implemented as preventative controls in the real world. For instance, we may not fail all the Google Compute Engine deployments at the CI if these VMs have external IP addresses because external IP addresses may be required for a specific software or use cases. Another reason is that we want to produce time-stamped compliance status for audit purposes. Taking the CIS compliance as an example, we could have enforced all the CIS check on the CI and set IaC as the only deployment source for cloud infrastructure.
However, we will still need to demonstrate the runtime CIS compliance report using Security Command Center. Security responsive controls are not limited to remediation actions. They can also take the form of notifications via email, messaging tools, or integration with ITSM systems. If you use Terraform to deploy the infrastructure and use Cloud Function for auto-remediation, you need to pay attention to the Terraform state. Since auto-remediation actions performed by Cloud Function are not recorded in the Terraform state file, you will need to inform the engineers to update the source Terraform code.
The future
The fact that manual processes around security and compliance don’t scale points to automation as the next enabler. The economics of automation require a systemic discipline and holistic enterprise-wide approach to regulatory compliance and cloud risk management.
By defining a data model of the compliance process, the aforementioned OSCAL represents a game-changer for automation in risk management and regulatory compliance. While we realize that adopting “as code” practices is a long-term investment for most of our customers, Risk and Compliance as Code (RCaC) has a number of building blocks to get you started. By adopting the RCaC tenets you shift towards codified policies and infrastructure for a secure cloud transformation. Stay tuned as we introduce exciting new capabilities and features to Google Cloud Risk and Compliance as Code in the months to come.