This tutorial steps through the Google Cloud Healthcare Data Protection Toolkit, an automation framework for deploying Google Cloud resources to store and process healthcare data, including protected health information (PHI) as defined by the US Health Insurance Portability and Accountability Act (HIPAA). This tutorial is designed to be read alongside the HIPAA-aligned Google Cloud Cloud Healthcare architecture article.
Overview
The tutorial is for healthcare organizations who are getting started with Google Cloud and looking for an example of how to configure a Google Cloud infrastructure for data storage, analytics, or application development. This setup includes many of the security and privacy best-practice controls recommended for healthcare data, such as configuring appropriate access, maintaining audit logs, and monitoring for suspicious activities. For details about these best practices, see the Google Cloud HIPAA white paper.
The tutorial walks through various Google Cloud services that are capable of storing and processing PHI, but it doesn't cover all Google Cloud resource types and use cases. Instead, the tutorial focuses on a subset of resource types. For a list of Google Cloud services that support HIPAA compliance under Google's business associate agreement (BAA), review HIPAA Compliance on Google Cloud. You might also want to review Google Cloud documentation related to security, privacy, and compliance.
Disclaimer
- This tutorial explains a reference architecture, and does not constitute legal advice on the proper administrative, technical, and physical safeguards you must implement in order to comply with HIPAA or any other data privacy legislation.
- The scope of this tutorial is limited to protecting and monitoring data that is stored by in-scope resources. Implementing the tutorial doesn't automatically cover derivative data assets that are stored or processed by other Google Cloud storage services. You must apply similar protective measures to derivative data assets.
- The implementation in this tutorial is not an official Google product; it is intended as a reference implementation. The code is in an open-source project, the Google Cloud Healthcare Data Protection Toolkit, which is available under the Apache License, Version 2.0. You can use the framework as a starting point and configure it to fit your use cases. You are responsible for ensuring that the environment and applications that you build on top of Google Cloud are properly configured and secured according to HIPAA requirements.
- This tutorial walks you through a snapshot of the code in GitHub, which may be updated or changed over time. You might find more resource types—for example, Cloud SQL instances or Kubernetes clusters—included in the reference implementation than what this tutorial covers. Consult the README file for the latest scope.
Objectives
The objective of this tutorial is to provide a reference infrastructure as code (IaC) implementation for setting up a HIPAA-aligned Google Cloud project. This implementation strategy automates the following processes:- Create a Google Cloud project.
- Provision resources.
- Manage access to resources.
- Establish a collection of audit logs.
- Create a Cloud Monitoring Workspace.
- Deploy Forseti Security monitoring tools.
- Enable Cloud Monitoring metrics and alerts to detect suspicious activities.
Costs
Google Cloud offers customers a limited-duration free trial and a perpetual always-free usage tier, which apply to several of the services used in this tutorial. For more information, see the Google Cloud Free Tier page.
Depending on how much data or how many logs you accumulate while executing this implementation, you might be able to complete the implementation without exceeding the limits of the free trial or free tier. You can use the pricing calculator to generate a cost estimate based on your projected usage.
When you finish this implementation, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.
Before you begin
- Review the onboarding best practices guide.
- Review HIPAA compliance on Google Cloud.
- If you are using Google Cloud services in connection with PHI, execute a BAA with Google Cloud.
- Make sure that your basic Google Cloud setup is complete:
- Set up Cloud Identity for the customer organization, following best practices.
- Add an organization admin in Google Cloud.
- Set up a billing account.
Initializing your environment
In your shell, download the Google Cloud Healthcare Data Protection Toolkit. Note: If you are going to run the helper script in Cloud Shell, you can stop here and skip the next steps.
Install tools and dependency services:
Set up authentication:
gcloud auth login
Deployment preparation checklist
Before you begin, make sure you have the following information noted down.
Recommended user groups
Google Cloud uses Google Accounts and groups in supported Google Workspace or Cloud Identity domains or public Google Groups for permission control. In Google Cloud, it's a best practice to create a set of groups for predefined roles, so that you can control individual permissions easily by group membership without modifying the project.
Create or verify the existence of user groups described in the concept document. If your organization is a Google Workspace customer or you plan to use the script to provision resources for an existing Google Cloud organization, ask your administrator for the user groups. The administrator uses Google Admin console for this purpose. Use Cloud Identity help to find out who your admin is. The organizational users are set up in Google Cloud Identity. By default, the email account designated as admin in the setup process is assigned as Organization Admin. This person can create and assign other IAM roles. The access to various resources needed for data transfer, analysis, and security audit is controlled by IAM groups and associated roles.
In addition, the user running the deployment script must be in the owners
group(s) of all the projects that will be created, including the security
project. You can remove that user from these groups after your deployment is
successful. Add the user you are currently using as a member of the
OWNERS_GROUP
that you created above. Log in to the
Google Workspace Admin Console.
Alternatively, if you plan on testing outside a commercial account, you can use Google Groups.
Create each of the Google groups, then use the following secure settings of basic permissions when you create them:
- Group type: Email list
- View Topics: All Members of Group (and Managers)
- Post: Only Managers and Owners
- Join the group: Only Invited Users
Collecting environment info and naming project resources
A configuration file is in YAML format and lists each of the resources that you want to create and their respective properties. You can choose from a growing number of sample configuration files. We recommend that you use the remote audit log configuration file.
The overall
section contains organization
and billing
details that apply
to all projects.
- Define the project that will host audit logs in the
audit_logs_project
section. - Define the project that will host Forseti resources in the
forseti
section. - List all the required data-hosting projects under
projects
.
Collect the following information for your Google Cloud environment and update the file.
Parameter | Sample value |
---|---|
organization_id |
12345678 |
billing_account |
000000-000000-000000 |
location |
US or EU or Asia (for BigQuery and Cloud Storage) The default setting is multi-regional data storage within US or EU or Asia boundaries. |
TTL days |
This field in the audit section determines the retention time
for audit logs (currently only for storage buckets).An example value is 365 . |
project_id |
Provide unique names for the three projects:
|
stackdriver_alert_email |
[your admin email id] |
owners_group |
hipaa-sample-project-owner@google.com |
auditors_group |
hipaa-sample-project-auditor@google.com |
The toolkit allows you to follow best practices, such as consistent naming conventions and labeling of resources. Make decisions on project names, Cloud Storage bucket and BigQuery dataset names, and VPC names. For best practices in naming, see the following documents:
- VPC design best practices
- Best practices for organizations for project naming.
- Best practices for Cloud Storage for bucket naming.
Use the following to collect necessary information, which you can add to the deployment configuration file. Consider the components illustrated in the following example when establishing your naming conventions:
- Company name: Altostrat Company: altoco
- Business unit: Human Resources: hr
- Application code: Compensation system: comp
- Region code: north-america-northeast1: na-ne1, europe-west1: eu-we1
- Environment codes: dev, test, uat, stage, prod
Parameter | Sample naming convention |
---|---|
project_id |
Decide on project naming conventions. Example: {org-name}-{app/bu/purpose}-{environment} altoco-edw-prod Define the project that will host audit logs in the audit_logs_project
section.Define the project that will host Forseti resources in the forseti section.List all the required data-hosting projects under projects . |
{storage_name} bucket name | Example: {org-name hash}-{source}-{type}-{bu/region}-{seq#} 08uy6-cerner-inpatient-east-1 Use hash of org name for better security. |
BigQuery dataset name | Example: {org-name}_{bu/region}_{app/module} altoco_east_edw |
VPC name | Example:{org-name}-{bu/region}-{environment}-{seq#} altoco-east-prod-vpc-1 |
Understanding the scripts
At a high level, the Google Cloud Healthcare Data Protection Toolkit uses Terraform, Cloud Foundation Toolkit, and custom logic where appropriate to provision resources, set up IAM and logging according to best practices, and enable Forseti monitoring.
Helper scripts
The main helper
script apply.go
reads its configurations from a
YAML
configuration file and creates or modifies projects that are listed in the
configuration file. The script creates an audit logs project and a Forseti
project. Then the script creates a data-hosting project for each project that is listed
under projects
. For each listed project, the script performs the following:
- Creates the project (if the project is not already present).
- Enables billing on the project.
- Creates a bucket to store the Terraform state (either locally within the project or remotely in the central devops project).
- Prompts you to create or select a Workspace, which you must do by using Cloud Monitoring. For more details, see the Monitoring guide.
- Creates Monitoring Alerts for monitoring security rules.
- Deploys resources locally stored within the project, including services, monitoring resources, and data holding resources.
- Creates audit log data resources and sinks (either locally within the project or remotely in the central audit project).
- Creates Forseti monitoring resources.
The helper script rule_generator.go
reads
its configurations from a
YAML
configuration file to generate Forseti rules and write them to a local directory or a Cloud Storage bucket.
NOTE: Rule generation only works with the older manager–based configs at the moment.
Running the scripts
With Data Protection Toolkit, you use a configuration file to describe all the resources that you want for a single deployment. A configuration file is in YAML format and lists each of the resources that you want to create and their respective properties.
Copy a sample configuration file in the root folder.
Make sure you are in the
./healthcare/deploy
folder.cp ./samples/simple/config.yaml . nano config.yaml
Update the parameters that you collected previously in appropriate places in the configuration file.
The parameters of the main helper script apply.go
are defined as follows:
--config_path
- The relative or absolute path to the deployment configuration.
--projects
- Comma-separated list of project IDs from the deployment configuration. You can skip this parameter to indicate all projects listed in the deployment configuration. The script creates new projects and updates existing ones.
--dry_run
- Used to specify a dry run. Leave out
--dry_run
to specify a standard run.
The parameters of the helper script rule_generator.go
are defined as follows:
--config_path
- The relative or absolute path to the deployment configuration.
--output_path
- The path where the rule generation script outputs rule files. Can be a local directory or a Cloud Storage bucket. If unset, directly writes to the Forseti server bucket.
Dry-run mode
Performing a dry run allows you to review everything that the script would do given the specified deployment configuration, but does not actually do it. You can confirm that the deployment will be performed as expected when it actually happens.
bazel run cmd/apply:apply -- \ --config_path=[CONFIG_FILE] \ --projects=[PROJECTS] \ --dry_run
Project creation mode
After you examine the commands in a dry-run execution, when you are ready to do
the deployment, run the script without the --dry_run
parameter:
bazel run cmd/apply:apply -- \ --config_path=[CONFIG_FILE] \ --projects=[PROJECTS]
Update mode
If you want to add a resource to an existing project, or modify an existing
setting, specify the previously deployed project in the --projects
flag.
bazel run cmd/apply:apply -- \ --config_path=[CONFIG_FILE] \ --projects=[PROJECTS]
Rule generation
If you have deployed a Forseti project using the main script, you might want to run the rule generation script to generate and upload rules. By default, the rules will be uploaded to the Forseti server bucket. If you want to create your own set of Forseti rules, skip this step.
bazel run cmd/rule_generator:rule_generator -- \ --config_path=[CONFIG_FILE]
Verifying the results
The helper script encapsulates many commands. This section highlights what to expect in the Cloud Console after you successfully run the script.
Cloud Console
Your project(s) are available in the Cloud Console:
IAM
For each project, the
Cloud Console
shows OWNERS_GROUP
as the project owner and AUDITORS_GROUP
as the
security reviewer for the project.
Although the preceding screenshot shows only membership of the OWNERS_GROUP
and AUDITORS_GROUP
, you likely see several
service accounts
that have project-level access because of the
APIs
that you have enabled in the project. The most common service accounts are
as follows:
project_number@cloudservices.gserviceaccount.com
is a Google-managed service account used by Google API service agents such as the Deployment Manager API.project_number-compute@developer.gserviceaccount.com
is a user-managed service account used by the Compute Engine API.service-project_number@containerregistry.iam.gserviceaccount.com
is the Container Registry service account.
Storage browser
Storage browser shows the newly created bucket for storing logs. Bucket name, storage class, and location all follow the deployment configuration. This example is for a local log arrangement. In a remote log arrangement this bucket is in a different project from the data and consolidates logs from all data projects while in a local log mode. Each project has its own logs bucket.
Lifecycle
Lifecycle is enabled for the logs bucket. View the lifecycle policies to see that a Delete action for objects that are older than the value is specified by ttl_days
in the deployment configuration.
Aside from the permissions for
cloud-storage-analytics@google.com
, the rest of the permissions are set as you specified. Refer to the documentation on logs to understand whycloud-storage-analytics@google.com
must have write access to the bucket.Storage browser shows the newly created bucket(s) for storing data. Bucket Name, Default storage class, and Location match the specifications in the deployment configuration.
Objects in each data bucket are versioned. Run the following command to verify, replacing
bucket_name
with the name of your bucket:gsutil versioning get gs://bucket_name
Access and storage logs for the data buckets are captured and stored in the logs bucket; logging started when the data bucket was created. Run the following command to verify:
gsutil logging get gs://bucket_name
Permissions for each bucket are set according to your specifications.
Google Cloud Console
In the Cloud Console, Data access logs is turned on for all services.
Cloud Console
In the Cloud Console, verify that the BigQuery API is enabled.
BigQuery
In the Cloud Console, verify that the dataset where Cloud Logging sinks Cloud Audit Logs is shown. Also verify that the values for Description, Dataset ID, and Data location match the specifications in the deployment configuration and logging export sink that you saw previously.
Verify that access to the dataset is set according to your specifications.
Also verify that the service account that streams Cloud Logging logs into BigQuery is given edit rights to the dataset.
Verify that the newly created datasets for storing data are shown and that the Description, Dataset ID, and Data location values, and the labels for each dataset, match the specifications in the deployment configuration.
Verify that access to the dataset is set as you specified. You likely see other service accounts with inherited permissions, depending on the APIs that you've enabled in your project.
Pub/Sub
Verify that Pub/Sub shows the newly created topic and that the topic name, list of subscriptions, and details of each subscription—for example, Delivery type and Acknowledgement deadline—match the specifications in the deployment configuration.
Also verify that access rights for the topic match the deployment
configuration. For instance, the following screenshot shows the
OWNERS_GROUP
inheriting ownership of the topic and the READ_WRITE_GROUP
having the topic editor role. Depending on the APIs that you have enabled in
the project, you likely see other service accounts with inherited
permissions.
Cloud Logging
In the Cloud Console, verify that a new export sink is shown. Make a note of the values for Destination and Writer Identity and compare to what you will see next in BigQuery.
Verify that logs-based metrics are set up to count incidents of suspicious activities in audit logs.
Cloud Monitoring alerting
Verify that Cloud Monitoring contain alerting policies for the corresponding logs-based metrics:
In the Google Cloud Console, go to Monitoring.
In the Alerting menu, select Policies overview.
Query logs
With the audit logs streamed into BigQuery, you can use the following SQL query to organize log history in chronological order by type of suspicious activity. Use this query in the BigQuery editor or through the BigQuery command-line interface as a starting point to define the queries that you must write to meet your requirements.
```sql
SELECT timestamp,
resource.labels.project_id AS project,
protopayload_auditlog.authenticationinfo.principalemail AS offender,
'IAM Policy Tampering' AS offenseType
FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*`
WHERE resource.type = "project"
AND protopayload_auditlog.servicename =
"cloudresourcemanager.googleapis.com"
AND protopayload_auditlog.methodname = "setiampolicy"
UNION DISTINCT
SELECT timestamp,
resource.labels.project_id AS project,
protopayload_auditlog.authenticationinfo.principalemail AS offender,
'Bucket Permission Tampering' AS offenseType
FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*`
WHERE resource.type = "gcs_bucket"
AND protopayload_auditlog.servicename = "storage.googleapis.com"
AND ( protopayload_auditlog.methodname = "storage.setiampermissions"
OR protopayload_auditlog.methodname = "storage.objects.update" )
UNION DISTINCT
SELECT timestamp,
resource.labels.project_id AS project,
protopayload_auditlog.authenticationinfo.principalemail AS offender,
'Unexpected Bucket Access' AS offenseType
FROM `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_data_access_*`
WHERE resource.type = 'gcs_bucket'
AND ( protopayload_auditlog.resourcename LIKE
'%hipaa-sample-project-logs'
OR protopayload_auditlog.resourcename LIKE
'%hipaa-sample-project-bio-medical-data' )
AND protopayload_auditlog.authenticationinfo.principalemail NOT IN(
'user1@google.com', 'user2@google.com' )
```
The following image shows a sample result when you run the query by using the BigQuery command-line interface.
Cleaning up
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- For a list of Google Cloud services that support HIPAA compliance under the Google business associate agreement (BAA), review HIPAA Compliance on Google Cloud.
- Review Google Cloud documentation related to security, privacy, and compliance.
- Bridge the gap between your existing health IT and Google Cloud by using the Cloud Healthcare API.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.