The reference architecture described in this guide helps you transfer sensitive information to the cloud. This guide is intended for Google Cloud customers in healthcare and life sciences. The guide also extends to enterprise security teams across industries who seek a baseline implementation of Google Cloud best practices for data protection.
- This guide does not constitute legal advice on the proper administrative, technical, and physical safeguards you must implement in order to comply with HIPAA or any other data privacy legislation.
- The scope of this guide is limited to implementation of a reference architecture. The implementation in this architecture is not an official Google product; it is intended as a reference implementation. The code is available open source as the Google Cloud data protection suite and is available under the Apache License, Version 2.0. You may use the guide as a quickstart and configure it to fit your use cases. You are responsible for ensuring that the environment and applications that you build on Google Cloud are properly configured and secured according to HIPAA or any regulatory compliance.
- Implementation of the solution guide or reference architecture does not automatically cover any data assets that are stored or processed by other Google Cloud storage services. Similar protective measures must be applied to all other data stored across the environment.
- The implementation of this solution may vary in customer environments based on the choice of products and configuration options.
The Data Protection Toolkit is an open source utility, released by Google. The toolkit is part of the Data Protection Suite, a repository of utilities that focuses on placing guardrails around sensitive data and providing a centralized view that helps you understand who did what, where, and when with that sensitive data. The toolkit provides an automation framework for provisioning and managing Google Cloud projects. The toolkit combines infrastructure-as-code best practices, security configurations, and best practices for deploying Google Cloud resources to store and process sensitive data.
The toolkit uses Terraform Engine, a tool that generates complete Terraform deployments through Terraform root modules called templates. These templates are deployed through recipes, which contain multiple templates. The toolkit also uses continuous integration/continuous delivery (CI/CD) pipelines for approval, tests, and auto-deployment of resources in order to ensure that the environment is kept up to date. You can also use the toolkit to set up Forseti, an open source tool for continuous configuration monitoring of deployed projects and their resources. Because of its expressive deployment capabilities and integrated monitoring tools, the toolkit is a powerful tool for privacy-, security-, and compliance-focused use cases.
Toolkit templates can help you with the following tasks:
- Deploying identical environments (for example, development, test, and production) with minimal manual intervention.
- Minimizing errors by using validated templates and automated deployments.
- Deploying, testing, and validating Google Cloud workloads, with zero downtime.
- Enabling rapid deployment of failed workloads to aid in disaster recovery.
- Using built-in policies to align with compliance requirements and best practices.
- Deploying infrastructure-related auditing and monitoring tools in parallel with workload deployment.
- Reducing maintenance costs by monitoring capacity and automating removal of unused resources.
You can drive development efficiency by using these templates to update or restore deployments to the required state. Also, you can track changes to the templates by maintaining them in a code repository. This tracking helps you drive accountability and maintain discipline and quality control.
You can clone the Data Protection Suite repository and use it to deploy the templates. The current templates focus on the healthcare industry, but you can use the toolkit to support use cases related to banking and finance, gaming, marketing, and education.
- Macro quickstart: Package tutorials and quickstarts as deployable templates to bring up all necessary projects and resources.
- Customer onboarding: Define templates that contain all base resources needed to get a minimal footprint up and running quickly on Google Cloud. Accelerate customer adoption in a security- and governance-forward approach.
- Data analysis/R&D: Quickly provision multiple identical environments and sandboxes for researchers and analysts to use for experimenting on the data lake without disrupting production usage. A built-in central governance layer helps IT and security teams manage their organization's resources.
- Compliance and regulations alignment: Build compliance requirements and guidelines into templates to ensure that resources are deployed appropriately from the beginning—no need to worry about adding security monitoring after the fact. These templates cover Health Information Privacy Availability Act (HIPAA), GxP, General Data Protection Regulation (GDPR), and other environments with sensitive data.
- Repeatable and reproducible deployments: Define identical environments (for example, dev, test, and prod) and reproducible deployments. Use the toolkit to define a template once, and deploy it consistently as many times as needed for projects that handle various industry-focused scenarios.
- Share Google Cloud development: Share templates across teams and organizations, including the open source community. This approach offers an easy way to provision and manage Google Cloud resources for common use cases.
The Data Protection Toolkit provides a framework for deploying your entire organizational structure, including central DevOps, auditing, monitoring, and data hosting projects. Because Cloud projects delineate clear boundaries between resources and functions of an organization, the toolkit makes it easier to create these projects securely and repeatably.
You can use the toolkit to completely manage your organization's projects by creating and managing all resources in the projects or only a subset of projects. For example, suppose your organization has a highly customized solution or needs many resources that the toolkit doesn't support natively. In this case, you can still use the toolkit to set up the foundational pieces such as data projects and centralized auditing, DevOps, and monitoring. You can then manage resources in the project through another process—for example, directly using Terraform Engine.
The architecture in this guide helps you build a Google Cloud–based infrastructure by treating the configuration as code. The following diagram illustrates how the architecture helps you meet security and compliance best practices by using reusable building blocks—specifically, Terraform Engine recipes and CI/CD pipelines.
This diagram shows how the Data Protection Toolkit manages DevOps, auditing, and monitoring projects. Consider the architecture as a quickstart that you can customize for deploying a specific use case or product; it doesn't produce a final product that you can simply deploy.
- DevOps Project: Hosts the CI/CD resources and Terraform remote state. Contains a Cloud Storage bucket, Cloud Build triggers, Service Accounts, and necessary IAM permissions.
- Audit Project: Consists of audit log resources, including a BigQuery dataset, a Cloud Storage bucket, and log sinks to export the audit logs.
- Monitor Project: Contains Forseti resources for monitoring the scoped environment for compliance.
- Cloud Monitoring Account: Provides monitoring for security and IT admin purposes.
- IAM: Provides access and authorization for administrators and the CI/CD pipeline to create resources.
- Organization Policies: Includes best practice policies to control cloud resources.
This section provides a high-level summary of the deployment process. For a detailed solution deployment guide, see the Data Protection Suite GitHub repository on GitHub.
The toolkit and its associated templates are available in the public Data Protection Suite GitHub repository free of charge. However, standard Google Cloud consumption charges apply to the Cloud projects once they are deployed and in active use. Google Cloud offers customers a limited-duration free trial and a perpetual always-free usage tier, both of which apply to several of the services used in this tutorial. For more information, see the Google Cloud Free Tier page.
Depending on how much data or how many logs you accumulate while executing this implementation, you might be able to complete the implementation without exceeding the free-trial or free-tier limits. To generate a cost estimate based on your projected usage, you can use the pricing calculator.
When you finish this implementation, you can avoid continued billing by deleting the resources you created.
Before you can deploy the toolkit base architecture, you must do the following.
- Review the onboarding best practices guide.
- Make sure that your basic Google Cloud setup is complete:
Deploying the base DPT architecture involves the following steps:
- Bootstrap: Initiated by the administrator, Terraform Engine creates the DevOps project consisting of the Cloud Storage bucket for the Terraform state, Cloud Build triggers for integration with the CI/CD pipeline, and service accounts.
- CI/CD Deployment: Establishes CI/CD pipelines by enabling the Cloud Build APIs, creating service accounts, connecting your repository, and creating pull requests and push triggers.
- Push Configurations: The administrator commits and pushes newly generated Terraform Engine configurations to the repository and creates a pull request.
- Auto Deploy Monitoring and Audit: When the pull request is approved, this step triggers an auto-deployment of the Monitoring and Audit projects and their resources.
- Solution Deployment: Your organizations might choose to protect newly deployed or existing projects. For more information, see this video demonstration.