Setting up a HIPAA-aligned project

This tutorial steps through the Google Cloud Healthcare Data Protection Toolkit, an automation framework for deploying Google Cloud Platform (GCP) resources to store and process healthcare data, including protected health information (PHI) as defined by the US Health Insurance Portability and Accountability Act (HIPAA). This tutorial is designed to be read alongside the HIPAA-aligned GCP Cloud Healthcare architecture article.

Overview

The tutorial is for healthcare organizations who are getting started with GCP and looking for an example of how to configure a GCP infrastructure for data storage, analytics, or application development. This setup includes many of the security and privacy best-practice controls recommended for healthcare data, such as configuring appropriate access, maintaining audit logs, and monitoring for suspicious activities. For details about these best practices, see the GCP HIPAA white paper.

The tutorial walks through various GCP services that are capable of storing and processing PHI, but it doesn't cover all GCP resource types and use cases. Instead, the tutorial focuses on a subset of resource types. For a list of GCP services that support HIPAA compliance under Google's business associate agreement (BAA), review HIPAA Compliance on Google Cloud Platform. You might also want to review GCP documentation related to security, privacy, and compliance.

Disclaimer

  • This tutorial explains a reference architecture, and does not constitute legal advice on the proper administrative, technical, and physical safeguards you must implement in order to comply with HIPAA or any other data privacy legislation.
  • The scope of this tutorial is limited to protecting and monitoring data that is stored by in-scope resources. implementing the tutorial doesn't automatically cover derivative data assets that are stored or processed by other Google Cloud Platform storage services. You must apply similar protective measures to derivative data assets.
  • The implementation in this tutorial is not an official Google product; it is intended as a reference implementation. The code is in an open-source project, the Google Cloud Healthcare Data Protection Toolkit, which is available under the Apache License, Version 2.0. You can use the framework as a starting point and configure it to fit your use cases. You are responsible for ensuring that the environment and applications that you build on top of GCP are properly configured and secured according to HIPAA requirements.
  • This tutorial walks you through a snapshot of the code in GitHub, which may be updated or changed over time. You might find more resource types—for example, Cloud SQL instances or Kubernetes clusters—included in the reference implementation than what this tutorial covers. Consult the README file for the latest scope.

Objectives

The objective of this tutorial is to provide a reference infrastructure as code (IaC) implementation for setting up a HIPAA-aligned GCP project. This implementation strategy automates the following processes:

Costs

GCP offers customers a limited-duration free trial and a perpetual always-free usage tier, which apply to several of the services used in this tutorial. For more information, see the GCP Free Tier page.

Depending on how much data or how many logs you accumulate while executing this implementation, you might be able to complete the implementation without exceeding the limits of the free trial or free tier. You can use the pricing calculator to generate a cost estimate based on your projected usage.

When you finish this implementation, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Before you begin

  1. Review the onboarding best practices guide.
  2. Review HIPAA compliance on GCP.
  3. If you are using GCP services in connection with PHI, execute a BAA with GCP.
  4. Make sure that your basic GCP setup is complete:
  5. Set up Cloud Identity for the customer organization, following best practices.
  6. Add an organization admin in GCP.
  7. Set up a billing account.

Initializing your environment

  1. In your shell, download the Google Cloud Healthcare Data Protection Toolkit and install Python dependencies:

    pip install -r requirements.txt
    
  2. Install tools and dependency services:

    • Bazel - An open source build and test tool
    • pip - A package manager for Python packages.
    • Cloud SDK - A set of tools for managing resources and applications hosted on GCP.
    • Git - A distributed version control system.
  3. Set up authentication:

    gcloud auth login
    

Deployment preparation checklist

Before you begin, make sure you have the following information noted down.

GCP uses Google Accounts and groups in supported G Suite or Cloud Identity domains or public Google Groups for permission control. In GCP, it's a best practice to create a set of groups for predefined roles, so that you can control individual permissions easily by group membership without modifying the project.

Create or verify the existence of user groups described in the concept document. If your organization is a G Suite customer or you plan to use the script to provision resources for an existing GCP organization, ask your administrator for the user groups. The administrator uses Google Admin console for this purpose. Use Cloud Identity help to find out who your admin is. The organizational users are set up in Google Cloud Identity. By default, the email account designated as admin in the setup process is assigned as Organization Admin. This person can create and assign other IAM roles. The access to various resources needed for data transfer, analysis, and security audit is controlled by IAM groups and associated roles.

In addition, the user running the deployment script must be in the owners group(s) of all the projects that will be created, including the security project. You can remove that user from these groups after your deployment is successful. Add the user you are currently using as a member of the OWNERS_GROUP that you created above. Log in to the G Suite Admin Console.

Alternatively, if you plan on testing outside a commercial account, you can use Google Groups.

Create each of the Google groups, then use the following secure settings of basic permissions when you create them:

  • Group type: Email list
  • View Topics: All Members of Group (and Managers)
  • Post: Only Managers and Owners
  • Join the group: Only Invited Users

Collecting environment info and naming project resources

A configuration file is in YAML format and lists each of the resources that you want to create and their respective properties. You can choose from a growing number of sample configuration files. We recommend you use remote audit log configuration file.

The overall section contains organization and billing details that apply to all projects.

  • Define the project that will host audit logs in the audit_logs_project section.
  • Define the project that will host Forseti resources in the forsetisection.
  • List all the required data-hosting projects under projects.

Collect the following information for your GCP environment and update the file.

Parameter Sample value
organization_id 12345678
billing_account 000000-000000-000000
location US or EU or Asia (for BigQuery and Cloud Storage)

The default setting is multi-regional data storage within US or EU or Asia boundaries.
Ttl days This field in the audit_logs section determines the retention time for audit logs.

An example value is 365.
project_id Provide unique names for the three projects:

  1. audit_logs_project
  2. Forseti
  3. Dataset project (projects.project_id)
stackdriver_alert_email [your admin email id]
owners_group hipaa-sample-project-owner@google.com
auditors_group hipaa-sample-project-auditor@google.com
data_readwrite_groups hipaa-sample-project-readwrite@google.com
data_readonly_groups hipaa-sample-project-readonly@google.com

The toolkit allows you to follow best practices, such as consistent naming conventions and labeling of resources. Make decisions on project names, Cloud Storage bucket and BigQuery dataset names, and VPC names. For best practices in naming, see the following documents:

Use the following to collect necessary information, which you can add to the deployment configuration file. Consider the components illustrated in the following example when establishing your naming conventions:

  • Company name: Altostrat Company: altoco
  • Business unit: Human Resources: hr
  • Application code: Compensation system: comp
  • Region code: north-america-northeast1: na-ne1, europe-west1: eu-we1
  • Environment codes: dev, test, uat, stage, prod
Parameter Sample naming convention
project_id Decide on project naming conventions.

Example:
{org-name}-{app/bu/purpose}-{environment}

altoco-edw-prod

Define the project that will host audit logs in the audit_logs_project section.

Define the project that will host Forseti resources in the forseti section.

List all the required data-hosting projects under projects.
{storage_name} bucket name Example: {org-name hash}-{source}-{type}-{bu/region}-{seq#}

08uy6-cerner-inpatient-east-1

Use hash of org name for better security.
BigQuery dataset name Example: {org-name}_{bu/region}_{app/module}

altoco_east_edw
VPC name Example:
{org-name}-{bu/region}-{environment}-{seq#}

altoco-east-prod-vpc-1

Understanding the script

At a high level, the Google Cloud Healthcare Data Protection Toolkit uses Cloud Foundation Toolkit templates and custom logic where appropriate to provision resources, set up IAM and logging according to best practices, and enable Forseti monitoring.

Helper script

The helper script create_project.py reads its configurations from a YAML configuration file and creates or modifies projects that are listed in the configuration file. The script creates an audit logs project and a Forseti project. Then the script creates a data-hosting project for each project that is listed under projects. For each listed project, the script performs the following:

  • Creates the project (if the project is not already present).
  • Enables billing on the project.
  • Enables Deployment Manager API and grants temporary Owners and Storage Admin permissions to the Deployment Manager service account.
  • Creates two deployments:
    • data-protect-tookit-resources: This deployment contains all the resources that exist in the project.
    • data-protect-toolkit-audit-${PROJECT_ID}: This deployment contains all audit resources for the data project. If a remote audit logs project was specified, this deployment will be created in the remote project, else it will be created in the data project itself as it will locally host the audit logs.
  • Creates Forseti monitoring resources and uploads security rules.
  • Prompts you to create or select a Stackdriver Workspace, which you must do by using the Stackdriver UI. For more details, see the Stackdriver guide.
  • Creates Stackdriver Alerts for monitoring security rules.

Running the script

With Data Protection Toolkit, you use a configuration file to describe all the resources that you want for a single deployment. A configuration file is in YAML format and lists each of the resources that you want to create and their respective properties.

  1. Copy a sample configuration file in the root folder.

  2. Make sure you are in the ./healthcare/deploy folder.

    cp ./samples/project_with_remote_audit_logs.yaml .
    nano project_with_remote_audit_logs.yaml
    
  3. Update the parameters that you collected previously in appropriate places in the configuration file.

The parameters are defined as follows:

--project_yaml
The relative or absolute path to the deployment configuration.
--projects
Comma-separated list of project IDs from the deployment configuration. You can use * to indicate all projects listed in the deployment configuration. The script creates new projects and updates existing ones.
--output_yaml_path
The path where the Deployment script outputs a modified Deployment configuration. The resulting deployment configuration contains the original configuration plus other fields that are generated during the deployment process, such as project numbers and information required to resume the script after a failure.
--nodry_run
Used to specify a standard run.
--dry_run
Used to specify a dry run. Alternatively, you can leave out both --dry_run and --nodry_run to the same effect.
--verbosity

Level of verbosity, from -3 to 1, with higher values producing more information:

  • -3: FATAL logs only, the lowest level of verbosity.
  • -2: ERROR and FATAL logs.
  • -1: WARNING, ERROR, and FATAL logs.
  • 0: INFO, WARNING, ERROR, and FATAL logs. This is the default level.
  • 1: DEBUG, INFO, WARNING, ERROR, and FATAL logs, the highest level of verbosity.

Dry-run mode

Performing a dry run allows you to review everything that the script would do given the specified deployment configuration, but does not actually do it. You can confirm that the deployment will be performed as expected when it actually happens.

bazel run :create_project -- \
    --project_yaml=[CONFIG_FILE]
    --Projects=[PROJECTS]
    --output_yaml_path=[CONFIG_FILE]\
    --dry_run

Project creation mode

After you examine the commands in a dry-run execution, when you are ready to do the deployment, run the script with the --nodry_run parameter:

bazel run :create_project -- \
    --project_yaml=[CONFIG_FILE]
    --Projects=[PROJECTS]
    --output_yaml_path=[CONFIG_FILE]\
    --nodry_run

Update mode

If you want to add a resource to an existing project, or modify an existing setting, specify the previously deployed project in the --projects flag.

bazel run :create_project -- \
    --project_yaml=[CONFIG_FILE]
    --Projects=[PROJECTS]
    --output_yaml_path=[CONFIG_FILE]\
    --nodry_run

Verifying the results

The helper script encapsulates many commands. This section highlights what to expect in the GCP Console after you successfully run the script.

GCP Console

Your project(s) are available in the GCP Console:

projects listed in the GCP Console

Cloud IAM console

For each project, the IAM console shows OWNERS_GROUP as the project owner and AUDITORS_GROUP as the security reviewer for the project.

Permissions for your project in the console

Although the preceding screenshot shows only membership of the OWNERS_GROUP and AUDITORS_GROUP, you likely see several service accounts that have project-level access because of the APIs that you have enabled in the project. The most common service accounts are as follows:

-   <code><var>project_number</var>@cloudservices.gserviceaccount.com</code>
    is a
    [Google-managed service account](/iam/docs/service-accounts#google-managed_service_accounts){: track-type="tutorial" track-name="internalLink" track-metadata-position="body" }
    used by Google API service agents such as the Deployment Manager API.
-   <code><var>project_number</var>-compute@developer.gserviceaccount.com</code>
    is a
    [user-managed service account](/iam/docs/service-accounts#user-managed_service_accounts){: track-type="tutorial" track-name="internalLink" track-metadata-position="body" }
    used by the Compute Engine API.
-   <code>service-<var>project_number</var>@containerregistry.iam.gserviceaccount.com</code>
    is the
    [Container Registry service account](/container-registry/docs/overview#container_registry_service_account){: track-type="tutorial" track-name="internalLink" track-metadata-position="body" }.

Storage browser

Storage browser shows the newly created bucket for storing logs. Bucket name, storage class, and location all follow the deployment configuration. This example is for a local log arrangement. In a remote log arrangement this bucket is in a different project from the data and consolidates logs from all data projects while in a local log mode. Each project has its own logs bucket.

Consolidating logs for all data projects while in a local log mode

Lifecycle

Lifecycle is enabled for the logs bucket. View the lifecycle policies to see that a Delete action for objects that are older than the value is specified by ttl_days in the deployment configuration.

Viewing the lifecycle policies

  • Aside from the permissions for cloud-storage-analytics@google.com, the rest of the permissions are set as you specified. Refer to the documentation on logs to understand why cloud-storage-analytics@google.com must have write access to the bucket.

    Groups with write permissions

  • Storage browser shows the newly created bucket(s) for storing data. Bucket Name, Default storage class, and Location match the specifications in the deployment configuration.

    Newly created buckets

  • Objects in each data bucket are versioned. Run the following command to verify, replacing bucket_name with the name of your bucket:

    gsutil versioning get gs://bucket_name
    
  • Access and storage logs for the data buckets are captured and stored in the logs bucket; logging started when the data bucket was created. Run the following command to verify:

    gsutil logging get gs://bucket_name
    
  • Permissions for each bucket are set according to your specifications.

    Groups with write permissions to the bucket

Admin console

In the Admin console, Data access logs is turned on for all services.

Data access logs in the Admin console

API console

In the API console, verify that the BigQuery API is enabled.

BigQuery API enabled in API console

BigQuery console

In the BigQuery console, verify that the dataset where Stackdriver sinks Cloud Audit Logs is shown. Also verify that the values for Description, Dataset ID, and Data location match the specifications in the deployment configuration and logging export sink that you saw previously.

BigQuery console shows the dataset where Stackdriver sinks Cloud Audit Logs

Verify that access to the dataset is set according to your specifications.

Also verify that the service account that streams Stackdriver logs into BigQuery is given edit rights to the dataset.

BigQuery data permissions

Verify that the newly created datasets for storing data are shown and that the Description, Dataset ID, and Data location values, and the labels for each dataset, match the specifications in the deployment configuration.

The BigQuery console shows the newly created datasets

Verify that access to the dataset is set as you specified. You likely see other service accounts with inherited permissions, depending on the APIs that you've enabled in your project.

BigQuery data permissions for storing data

Cloud Pub/Sub console

Verify that the Cloud Pub/Sub console shows the newly created topic and that the topic name, list of subscriptions, and details of each subscription—for example, Delivery type and Acknowledgement deadline—match the specifications in the deployment configuration.

Also verify that access rights for the topic match the deployment configuration. For instance, the following screenshot shows the OWNERS_GROUP inheriting ownership of the topic and the READ_WRITE_GROUP having the topic editor role. Depending on the APIs that you have enabled in the project, you likely see other service accounts with inherited permissions.

Cloud Pub/Sub console shows the newly created topic

Logging console

In the Logging console, verify that a new export sink is shown. Make a note of the values for Destination and Writer Identity and compare to what you will see next in the BigQuery console.

The Logging console showing a new export sink

Verify that logs-based metrics are set up to count incidents of suspicious activities in audit logs.

Stackdriver Logging console shows logs-based metrics that are set up to count incidents of suspicious activities

Stackdriver Alerting console

In the Stackdriver Alerting console, verify that alerting policies are shown that trigger based on the corresponding logs-based metrics.

Stackdriver Alerting console shows alerting policies that trigger based on the corresponding logs-based metrics

Query logs

With the audit logs streamed into BigQuery, you can use the following SQL query to organize log history in chronological order by type of suspicious activity. Use this query in the BigQuery editor or through the BigQuery command-line interface as a starting point to define the queries that you must write to meet your requirements.

```sql
SELECT timestamp,
       resource.labels.project_id                              AS project,
       protopayload_auditlog.authenticationinfo.principalemail AS offender,
       'IAM Policy Tampering'                                  AS offenseType
FROM   `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*`
WHERE  resource.type = "project"
       AND protopayload_auditlog.servicename =
           "cloudresourcemanager.googleapis.com"
       AND protopayload_auditlog.methodname = "setiampolicy"
UNION DISTINCT
SELECT timestamp,
       resource.labels.project_id                              AS project,
       protopayload_auditlog.authenticationinfo.principalemail AS offender,
       'Bucket Permission Tampering'                           AS offenseType
FROM   `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_activity_*`
WHERE  resource.type = "gcs_bucket"
       AND protopayload_auditlog.servicename = "storage.googleapis.com"
       AND ( protopayload_auditlog.methodname = "storage.setiampermissions"
              OR protopayload_auditlog.methodname = "storage.objects.update" )
UNION DISTINCT
SELECT timestamp,
       resource.labels.project_id                              AS project,
       protopayload_auditlog.authenticationinfo.principalemail AS offender,
       'Unexpected Bucket Access'                              AS offenseType
FROM   `hipaa-sample-project.cloudlogs.cloudaudit_googleapis_com_data_access_*`
WHERE  resource.type = 'gcs_bucket'
       AND ( protopayload_auditlog.resourcename LIKE
             '%hipaa-sample-project-logs'
              OR protopayload_auditlog.resourcename LIKE
                 '%hipaa-sample-project-bio-medical-data' )
       AND protopayload_auditlog.authenticationinfo.principalemail NOT IN(
           'user1@google.com', 'user2@google.com' )
```

The following image shows a sample result when you run the query by using the BigQuery command-line interface.

Sample result when you run the query by using the BigQuery command-line interface

Cleaning up

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Solutions