Public Sector

How Cloud tools help with healthcare data security

September 30, 2022

Jeffrey Vasquez

Chrome Enterprise Solutions Architect

As the use of electronic health records (EHR) continues to rise, government agencies and healthcare providers need digital assurances that health information is protected and complies with key regulations such as HIPAA. They also need to ensure that any sensitive data can be protected–no matter the scale and speed at which they operate. Two cloud capabilities—Healthcare De-identification and Cloud Data Loss Prevention—can be instrumental to meeting these needs and keeping health data secure.

Healthcare De-identification (De-ID)

With Healthcare De-ID, users can automate the de-identification of healthcare data in their native formats–no need to extract text or images. Healthcare De-ID can automatically remove protected health information (PHI) from Digital Imaging and Communications in Medicine (DICOM) and Fast Healthcare Interoperability Resources (FHIR) data. This enables multiple use cases, including:

When sharing health information with non-privileged parties
When creating datasets from multiple sources and analyzing them
When anonymizing data so that it can be used in machine learning models

Cloud Data Loss Prevention (DLP)

With Cloud Data Loss Prevention (DLP), Google Cloud has created a comprehensive set of technologies that automate the discovery, classification, and protection of your most sensitive data. Cloud DLP works on text and images, and has three key features:

Data discovery and classification - Includes over 150 built-in infotype (name, address, SSN, etc.) detectors and the ability to create custom infotypes. Native support for scanning, classifying, and profiling sensitive data in Cloud Storage (GCS), BigQuery (BQ), and Datastore. A streaming content API enables support for additional data sources, custom workloads, and applications.
Automated data de-identification, masking, and tokenization - Automatically mask, tokenize, and transform sensitive elements to help better manage your data. This also makes data more easily used for analytics. Preserve the utility of your data for joining, analytics, and AI while protecting raw sensitive identifiers.
Measuring risk of re-identified data - Quasi-identifiers are partially identifying elements or combinations of data that may link to a single person or very small group. Cloud DLP allows you to measure statistical properties, such as k-anonymity and l-diversity. This expands your ability to understand and protect data privacy.

Healthcare De-ID and Cloud DLP leverage machine learning (ML), the ML models used to identify sensitive data continue to improve over time as Google continuously trains these models.

Using Healthcare De-ID and Cloud DLP to protect healthcare data

Cloud DLP and Healthcare De-ID can be an essential part of a healthcare data security suite. With the Healthcare De-ID and DLP API, agencies are now able to automate the identification and redaction of sensitive information like Personally Identifiable Information (PII) and Protected Health Information (PHI) from medical images. In fact, a large federal healthcare agency is currently using Google Cloud Healthcare De-ID for this purpose. They’ve coupled Healthcare De-ID with Google Cloud Healthcare API to perform medical imaging de-identification on over 400 medical images. By automating de-identification, the agency is saving time while adding stronger layers of protection to their sensitive data.

Cloud DLP can be used to identify and de-identify both streaming and storage data. There are two main ways to do this. Both options offer the same level of healthcare data security.

“Content” methods:

Stream data directly into the API
Payload data is not stored or persisted by the API
Supports full classification and DeID/redaction
Works on data from virtually anywhere (Google Cloud, on-premises, or another cloud provider)

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screenshot_2022-09-27_9.33.41_PM.max-900x900.png

“Storage” methods:

Native support for Google Cloud Storage, BQ, Datastore
Currently supports classification methods
Saves detailed findings to BigQuery
BigQuery supports Risk Analytics (K-anon, etc.)

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screenshot_2022-09-27_9.37.19_PM.max-900x900.png

How Healthcare De-ID and Cloud DLP obscures sensitive data

Cloud DLP provides options for tokenizing sensitive data through techniques such as Dynamic Masking and Bucketing. The sample figure below shows examples of Cloud DLP masking phone numbers with hashes and other sensitive identifiers like email addresses and social security numbers as generic categories. Healthcare De-ID offers similar options for transforming sensitive data.

The same methods can be used on unstructured data such as images. The images below show an example of Cloud DLP de-identifying an x-ray image, automatically removing all identifiable information.

https://storage.googleapis.com/gweb-cloudblog-publish/images/xray_original.max-1100x1100.png

Original image
Image credit: https://cloud.google.com/healthcare-api/docs/how-tos/dicom-deidentify (Disclaimer: No real PII was used in these samples)

https://storage.googleapis.com/gweb-cloudblog-publish/images/dicom_attribute_confidentiality_basic_prof.max-1100x1100.png

(Disclaimer: No real PII was used in these samples)

https://storage.googleapis.com/gweb-cloudblog-publish/images/xray_redact_all_text.max-1100x1100.png

(Disclaimer: No real PII was used in these samples)

One of the biggest advantages Google’s data de-identification solutions bring is its ability to scale an organization’s de-identification capabilities to meet its needs. By automating the process, organizations free up staff resources and lessen the chances of human error.

Keeping data more secure with de-identification

Data de-identification has been another challenge for healthcare organizations. The nature of healthcare data and patient privacy laws like HIPAA have made identifying and redacting sensitive data a labor-intensive task. Personally Identifiable Information (PII) or Protected Health Information (PHI) often requires manual review. Google Cloud’s Healthcare De-ID machine learning capabilities identify, tokenize and redact sensitive data on both text-based FHIR data and image-based DICOM data, making it usable at scale.

Healthcare De-ID is integrated with Google Cloud’s Healthcare FHIR API and Healthcare DICOM API, and Cloud DLP capabilities are built into Google’s native services such as BigQuery and Cloud Storage, Google Cloud’s managed data warehouse object storage solutions. With Google Cloud data de-identification solutions, organizations can reduce the risk of leaking sensitive data.

These de-identification solutions are just one example of how Google Cloud is helping government and healthcare organizations solve their biggest data problems with the power of ML. You can learn more about these technologies on the Healthcare De-ID and Cloud DLP webpages, and test it yourself on the live interactive demo.

Google Cloud also provides a series of How-to Guides to help you get started quickly with using Healthcare De-ID.

Posted in