This guide describes an automation framework for deploying Google Cloud Platform (GCP) resources in order to store and process healthcare data, including protected health information (PHI) as defined by the US Health Insurance Portability and Accountability Act (HIPAA).
- This guide presents a reference implementation, and does not constitute legal advice on the proper administrative, technical, and physical safeguards you must implement in order to comply with HIPAA or any other data privacy legislation.
- The scope of this guide is limited to protecting and monitoring data that is persisted by in-scope resources; following this implementation doesn't automatically cover derivative data assets that are stored or processed by other GCP storage services. You must apply similar protective measures to derivative data assets.
- The implementation in this guide is not an official Google product; it is intended as reference material. The open source code is available on GitHub as the Google Cloud healthcare deployment automation utility under the Apache License, Version 2.0. You can use the framework as a starting point and configure it to fit your use cases. You are responsible for ensuring that the environment and applications that you build on top of GCP are properly configured and secured according to HIPAA requirements.
- This guide walks you through a snapshot of the code in GitHub, which may be updated or changed over time. You might find more resource types—for example, Compute Engine instances or Kubernetes clusters—included in the reference implementation than what this guide covers. For the latest scope, see the README file.
The guide is intended for healthcare organizations who are getting started with GCP and looking for an example of how to configure a GCP project for data storage or analytics use cases. This setup includes many of the security and privacy best-practice controls recommended for healthcare data, such as configuring appropriate access, maintaining audit logs, and monitoring for suspicious activities.
Although the guide walks through various GCP services that are capable of storing and processing PHI, it doesn't cover all GCP resource types and use cases. Instead, the guide focuses on a subset of resource types. For a list of GCP services that support HIPAA compliance under Google's business associate agreement (BAA), review HIPAA Compliance on Google Cloud Platform. You might also want to review GCP documentation related to security, privacy, and compliance.
ObjectivesThe objective of this guide is to provide a reference infrastructure as code (IaC) implementation for setting up a HIPAA-aligned GCP project. This implementation strategy automates the following processes:
GCP offers customers a limited-duration free trial and a perpetual always-free usage tier, which apply to several of the services used in this tutorial. For more information, see the GCP Free Tier page.
Depending on how much data or how many logs you accumulate while executing this implementation, you might be able to complete the implementation without exceeding the limits of the free trial or free tier. You can use the pricing calculator to generate a cost estimate based on your projected usage.
When you finish this implementation, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.
Before you begin
- Review HIPAA compliance on GCP.
- Ensure that you have a valid GCP account, or sign up for one.
- If you are using GCP services in connection with protected health information, execute a BAA with GCP.
Make sure you have the recommended user groups:
- If your organization is a G Suite customer or you plan on using the accompanying helper script to provision resources for an existing GCP organization, work with your administrator and have them provide you with the user groups. The administrator uses Google Admin console for this purpose. Use Cloud Identity help to find out who your admin is.
- Alternatively, if you plan on testing the helper script outside of a commercial account, you can use Google Groups.
For more information, see recommended user groups.
Decide whether to use local or remote audit logs.
Initializing your environment
In your shell, download the Google Cloud deployment automation utility and install Python dependencies:
pip install -r requirements.txt
Install tools and dependency services:
Set up authentication:
gcloud auth login
Definitions and concepts
The reference implementation is based on the following best practices for managing healthcare data:
- Apply the principle of least privilege.
- Grant resource access to groups rather than individual users.
- Control who can alter Cloud Identity and Access Management (Cloud IAM) policies.
- Audit activity logs regularly.
- Be on the alert for suspicious activities.
- Rotate service account keys; not discussed in this guide.
- Disable unnecessary APIs.
You specify resources in your deployment configuration.
This guide refers to any of the following activities as suspicious:
- Cloud IAM policies are altered.
- Permissions are altered on in-scope resources.
- Anyone other than users defined in the
expected_userscollection, as specified in the deployment configuration, accesses in-scope resources.
Recommended user groups
Four types of groups or roles are recommended: owners, auditors, data read/write, and data read-only. You use your deployment configuration to specify user groups for each group type.
We recommend that you apply a consistent naming convention for naming the user
groups. This guide uses the convention
domain=google.com, the Owners group
The following table summarizes the Cloud IAM roles for each group type. For details on the listed Cloud IAM roles, refer to project access control, Cloud IAM roles for Cloud Storage, BigQuery access control, and Cloud Pub/Sub access control.
|Group||project||Log bucket (in Cloud Storage)||Logs in BigQuery||Data buckets (in Cloud Storage)||Data in BigQuery||Cloud Pub/Sub topics|
Storage location for audit logs: local vs. remote
You have two options for storing audit logs:
- Remote mode: Logs are stored in a separate GCP project, independent from the project where the core data is stored. With this arrangement, you can centralize all audit logs in one project and set common access policies across your organization.
- Local mode: Logs are stored in the same GCP project as the core data that they track. With this arrangement, you can maintain separate audit log repositories for each project.
Remote mode is a natural choice for large organizations that have multiple teams and initiatives and a central data governance team. In smaller organizations, or large organizations that have delegated data governance, local mode might be more suitable.
Running the script
The guide refers to
as the helper script. This script creates
GCP projects according to your
deployment configuration. A later section,
understanding the code, explains the
behind-the-scenes details, but running the helper script is straightforward.
In your environment, go to the folder where you cloned the Google Cloud
deployment automation utility. Among other files are the
BUILD files. You must specify a few parameters to run the script in one of
three modes: dry run, standard, or resume:
bazel run :create_project -- \ --project_yaml=config \ --projects=project_list \ --output_yaml_path=output_resume_config \ --output_cleanup_path=output_cleanup_script \ [--nodry_run|--dry_run] \ --verbosity=verbosity_level
The parameters are defined as follows:
- Relative or absolute path to the deployment configuration.
- List of project IDs from the deployment configuration. You can use
*to indicate all projects listed in the deployment configuration.
- Path where the deployment script outputs a modified deployment configuration. The resulting deployment configuration contains the original configuration plus other fields that are generated during the deployment process, such as project numbers and information required to resume the script after a failure.
Path where the deployment script outputs a cleanup script. The resulting script contains shell commands to clean up configurations and resources in your projects which you didn't request in the deployment configuration but were detected during the deployment. These shell commands are commented out by default to prevent accidental actions. After reviewing the commands, you can uncomment them and execute the script to have a clean deployment. Here's a sample cleanup command that disables the Container Registry API, if it is deemed unnecessary:
gcloud services disable containerregistry.googleapis.com --project hipaa-sample-project
Option that specifies a standard run.
Default option that specifies a dry run. If you omit both
--dry_run, the action defaults to
Level of verbosity, from
1, with higher values producing more information:
FATALlogs only, the lowest level of verbosity.
FATALlogs. This is the default level.
FATALlogs, the highest level of verbosity.
The default mode of execution for the
create_project.py script is a dry
run. This mode runs the logic of the script but doesn't create or update any
resources. Performing a dry run allows you to preview your
deployment configuration. For example, for a
local audit arrangement, using the sample
deployment configuration, you run the following bash command:
bazel run :create_project -- \ --project_yaml=./samples/project_with_local_audit_logs.yaml \ --output_yaml_path=/tmp/output.yaml \ --dry_run
After you examine the commands in a dry-run execution, when you are ready to do
the deployment, run the script with the
bazel run :create_project -- \ --project_yaml=config \ --output_yaml_path=/tmp/output.yaml \ --nodry_run
If the script fails at any point, after first addressing the underlying issue,
resume from the failed step by specifying both the
bazel run :create_project -- \ --project_yaml=config \ --output_yaml_path=/tmp/output.yaml \ --nodry_run \ --resume_from_project=project_ID \ --resume_from_step=step_number
For the list of steps, refer to
Verifying the results
The helper script encapsulates many commands. Technically, you could achieve the same results by executing the commands on the command line or interactively through the GCP Console. If you already practice IaC principles, you'd agree why it's a good idea to automate the process. This section highlights what to expect in the GCP Console after you successfully run the script.
Verify that the GCP Console shows your project or projects:
Cloud IAM console
For each project, verify that the IAM console shows
OWNERS_GROUPas the project owner and
AUDITORS_GROUPas the security reviewers for the project owner.
Although the preceding screenshot shows only membership of
AUDITORS_GROUP, you likely see several service accounts that have project-level access because of the APIs that you have enabled in the project. The most common service accounts are as follows:
firstname.lastname@example.org a Google-managed service account used by Google API service agents such as the Deployment Manager API.
email@example.com a user-managed service account used by the Compute Engine API.
firstname.lastname@example.org the Container Registry service account.
Look for the following information in Storage browser:
For buckets that store the logs, verify that the values for Name, Default storage class, and Location all follow the deployment configuration. The following screenshot shows a local log arrangement. In a remote log arrangement, this bucket is in a different project from the data and consolidates logs from all data projects. In a local log mode, each project has its own logs bucket.
Verify that object lifecycle management is enabled for the logs bucket. Look for a Delete action that matches the value specified by
ttl_daysin the deployment configuration.
Go back to the main Storage Browser page and in the upper right, click Show info panel. Except for
email@example.com, verify that the permissions match Table 1. To understand why
firstname.lastname@example.org have write access to the bucket, see the product documentation.
For the buckets that store the data logs, verify that the values for Name, Default storage class, and Location match the specifications in the deployment configuration.
Verify that objects in each data bucket are versioned. Run the following command to verify, replacing
bucket_namewith the name of your bucket:
gsutil versioning get gs://bucket_name
Verify that access and storage logs for the data buckets are captured and stored in the logs bucket; logging started when the data bucket was created. Run the following command to verify:
gsutil logging get gs://bucket_name
Verify that permissions for each bucket are set according to Table 1.
In the API console, verify that the BigQuery API is enabled.
In the Logging console, verify that a new export sink is shown. Make a note of the values for Destination and Writer Identity and compare to what you will see next in the BigQuery console.
Verify that logs-based metrics are set up to count incidents of suspicious activities in audit logs.
In the BigQuery console, verify that the dataset where Stackdriver sinks Cloud Audit Logs is shown. Also verify that the values for Description, Dataset ID, and Data location match the specifications in the deployment configuration and logging export sink that you saw previously.
Verify that access to the dataset is set according to Table 1. Also verify that the service account that streams Stackdriver logs into BigQuery is given edit rights to the dataset.
Verify that the newly created datasets for storing data are shown and that the Description, Dataset ID, and Data location values, and the labels for each dataset, match the specifications in the deployment configuration.
Verify that access to the dataset is set according to Table 1. You likely see other service accounts with inherited permissions, depending on the APIs that you've enabled in your project.
Cloud Pub/Sub console
Verify that the Cloud Pub/Sub console shows the newly created topic and that the topic name, list of subscriptions, and details of each subscription—for example, Delivery type and Acknowledgement deadline—match the specifications in the deployment configuration.
Also verify that access rights for the topic match the deployment configuration. For instance, the following screenshot shows the
OWNERS_GROUPinheriting ownership of the topic and the
READ_WRITE_GROUPhaving the topic editor role. Depending on the APIs that you have enabled in the project, you likely see other service accounts with inherited permissions.
Stackdriver Alerting console
In the Stackdriver Alerting console, verify that alerting policies are shown that trigger based on the corresponding logs-based metrics.
With the audit logs streamed into BigQuery, you can use the following SQL query to organize log history in chronological order by type of suspicious activity. Use this query in the BigQuery editor or through the BQ command-line interface as a starting point to define the queries that you must write to meet your requirements.
SELECT timestamp, resource.labels.project_id AS project, protopayload_auditlog.authenticationinfo.principalemail AS offender, 'IAM Policy Tampering' AS offenseType FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*` WHERE resource.type = "project" AND protopayload_auditlog.servicename = "cloudresourcemanager.googleapis.com" AND protopayload_auditlog.methodname = "setiampolicy" UNION DISTINCT SELECT timestamp, resource.labels.project_id AS project, protopayload_auditlog.authenticationinfo.principalemail AS offender, 'Bucket Permission Tampering' AS offenseType FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*` WHERE resource.type = "gcs_bucket" AND protopayload_auditlog.servicename = "storage.googleapis.com" AND ( protopayload_auditlog.methodname = "storage.setiampermissions" OR protopayload_auditlog.methodname = "storage.objects.update" ) UNION DISTINCT SELECT timestamp, resource.labels.project_id AS project, protopayload_auditlog.authenticationinfo.principalemail AS offender, 'Unexpected Bucket Access' AS offenseType FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_data_access_*` WHERE resource.type = 'gcs_bucket' AND ( protopayload_auditlog.resourcename LIKE '%hipaa-sample-project-logs' OR protopayload_auditlog.resourcename LIKE '%hipaa-sample-project-bio-medical-data' ) AND protopayload_auditlog.authenticationinfo.principalemail NOT IN( 'email@example.com', 'firstname.lastname@example.org' )
The following image shows a sample result when you run the query by using the BigQuery command-line interface.
Understanding the code
At a high level, Google Cloud deployment automation utility uses Cloud Deployment Manager to provision some resources and set up IAM and logging according to best practices. This section explains the structure of the GitHub code repository to help you understand what's going on behind the scenes.
From the top:
samplesfolder contains sample deployment configurations.
templatesfolder contains reusable deployment templates.
utilsfolder includes various utility functions that the helper script uses.
BUILDis a Bazel build file.
README.mdis a lighter version of this guide.
create_project.pyis the helper script that simplifies the execution. Refer to running the script.
create_project_test.pycontains the unit tests, which are not discussed in this guide.
project_config.yaml.schemadefines the schema for the deployment configurations (
.yamlfiles) in the
Requirements.txtis a pip frozen requirements package.
The helper script: create_project.py
The helper script
create_project.py reads its configurations from a
YAML file and creates or modifies projects that are listed in that configuration
file. It creates an audit logs project if
audit_logs_project is provided, and
then creates a data-hosting project for each project that is listed under
projects. For each listed project, the script performs the following:
- If the project is not already present, the script creates the project.
- It enables billing on the project.
- It enables the Deployment Manager API and runs the
data_project.pytemplate to deploy resources in the project. The script grants temporary Owners permissions to the Deployment Manager service account while running the template.
- When setting up for remote audit logs, the script creates audit logs in
the audit logs project by using the
- It prompts you to create or select a Stackdriver Workspace, which you must do by using the Stackdriver UI. For more details, see the Stackdriver guide.
- If they are not already present, the script creates logs-based metrics and Stackdriver alerts for monitoring suspicious activities.
Sample deployment configurations
With Cloud Deployment Manager, you use a configuration to describe all the resources that you want for a single deployment. A configuration file is in YAML format and lists each of the resources that you want to create and their respective properties—for example:
- Cloud Storage buckets are specified using the
- BigQuery datasets are specified using the
- The Cloud Pub/Sub topic is specified using the
You can choose from a growing number of
sample configuration files.
Depending on your choice between
local or remote audit logs,
you start from either
Before using any of the samples, review and customize the values to
reflect the configuration you want.
The schema for these YAML files is defined in
billingdetails that apply to all projects. If you're not deploying your projects in a GCP organization, you can omit the
organization_id. Do track all projects to your organization, though.
- If you are using remote audit logs, define the project that will host audit
logs in the
- List all the required data-hosting projects under
Deployment Manager template
is essentially parts of the configuration file that have been abstracted into a
reusable building block. The
includes a growing number of them, two of which are the focus of this guide.
.py files, note the matching
.schema files under the templates
folder. Those schema files validate the fields in the templates. For
the correct schema in
data_project.py file sets up a new project for hosting data and
potentially for audit logs. It does the following:
- Grants exclusive project ownership to
- Creates BigQuery datasets for storing data with the recommended access controls according to Table 1.
- Creates Cloud Storage buckets for storing data with the recommended access control according to Table 1, turns on object versioning, and enables access and storage logs for the bucket.
- Creates a Cloud Pub/Sub topic and subscription with the access controls according to Table 1.
- If setting up for local audit logs:
- Creates a log sink to continuously export all audit logs into BigQuery.
- Creates logs-based metrics for capturing the number of incidents when:
- project-level IAM policies are changed. This includes IAM policies for Cloud Pub/Sub topics.
- permissions to Cloud Storage buckets or individual objects are changed.
- permissions to BigQuery datasets are changed.
- anyone other than the users defined in the
expected_userscollection, as specified in the deployment configuration, accesses in-scope resources. Currently, this applies only to Cloud Storage buckets.
- Enables data access logging on all services.
remote_audit_logs.py file sets up resources to store logs in a project
that is separate from where the data is stored. It does the following:
logs_bigquery_datasetspecifies the name of the BigQuery dataset for storing Cloud Audit Logs. Access to this dataset is arranged according to Table 1.
logs_gcs_bucketspecifies a Cloud Storage bucket for storing access and storage logs. Access to this bucket is arranged according to Table 1. Time to Live is defined according to
ttl_daysin the configuration file.
- Dans la console GCP, accédez à la page "Projets".
- Dans la liste des projets, sélectionnez celui que vous souhaitez supprimer, puis cliquez sur Supprimer.
- Dans la boîte de dialogue, saisissez l'ID du projet, puis cliquez sur Arrêter pour supprimer le projet.
- Bridge the gap between your existing health IT and GCP by using the Cloud Healthcare API.
- Classify and redact sensitive data by using the Data Loss Prevention (DLP) API.
- Define security perimeters for sensitive data by using VPC Service Controls.
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.