De-identification is the process of removing identifying information
from data. The Cloud Healthcare API supports the de-identification of health information
operation. The API detects sensitive data such as personally identifiable
information (PII), and then uses a de-identification transformation to mask,
delete, or otherwise obscure the data.
Some use cases for de-identification are:
- When sharing health information with non-privileged parties
- When creating datasets from multiple sources and analyzing them
- When anonymizing data so that it can be used in machine learning models
De-identification in the Cloud Healthcare API occurs at the dataset level.
This means that, when you want to de-identify health data, you run the
deidentify operation on the entire dataset in which the data resides. The
de-identified data is then copied to a new dataset.
You cannot, for example, de-identify the resources in a specific FHIR store within a dataset. If a dataset contains FHIR stores and DICOM stores that hold medical data, then de-identification occurs on all of the FHIR and DICOM data in that dataset.
De-identification does not impact the original dataset or its data. Instead, de-identified copies of the original data are written to a new dataset. In other words, the API returns the same items you gave it, in the same format, but with sensitive information processed according to your configuration.
A DICOM instance contains a set of key-value metadata elements (also known as
“tags”) and one or more images. The
deidentify operation can remove specific
tags that contain sensitive data. The operation can also use automated optical
character recognition (OCR) to redact burnt-in text on images contained in
For examples of how to de-identify DICOM data, see De-identifying DICOM data.
Each FHIR resource is a JSON-like object that contains key-value elements.
Some elements are standardized, while others are free text. You can use the
deidentify operation to:
- Remove specific values in the resource
- Process the arbitrary text portions to remove only the sensitive portions, leaving the rest of the data as is
For examples of how to de-identify FHIR data, see De-identifying FHIR data.