The tutorial and its accompanying conceptual document, De-identification of medical images through the Cloud Healthcare API, are intended for researchers, data scientists, IT teams, and healthcare and life sciences organizations. This tutorial guides you through two common use cases of de-identifying medical image data by using the Cloud Healthcare API. The conceptual document explains the rationale of DICOM data de-identification and outlines its high-level steps.
This tutorial assumes that you have a fundamental knowledge of Linux. A basic understanding of Google Cloud and DICOM standards is also helpful. Run all commands in this tutorial in a Linux terminal.
Objectives
- Use the DICOM de-identification operation of the Cloud Healthcare API to remove or modify PII and PHI in DICOM instances in a DICOM store.
- Remove or modify PII and PHI metadata and burned-in text in one Cloud Healthcare API call.
- Use either the
curl
command-line tool or the Google Cloud CLI to make DICOM de-identification Cloud Healthcare API calls.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
This tutorial assumes that your DICOM images have already been imported into a DICOM store. For information about creating DICOM stores on Google Cloud, see Creating and managing DICOM stores. For information about importing DICOM data into DICOM stores, see Importing and exporting DICOM data using Cloud Storage.
Additionally, this tutorial assumes that:
- You're working in a project called
MyProj
. - You've created a dataset called
dataset1
in theus-central1
Google Cloud region inMyProj
. - You've created a DICOM store called
dicomstore1
indataset1
.
If your resources are named differently, you'll need to modify the commands listed in this document accordingly.
- In the Google Cloud console, go to the Project selector page.
Go to the Project Selector page - Select a Google Cloud project called
MyProj
. -
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Healthcare API.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- In a shell, run the
gcloud components update
command to make sure that you have the latest version of the gcloud CLI that includes Cloud Healthcare API-related functionality.
Creating an IAM service account
The Healthcare Dataset Administrator role includes all of the required roles for this tutorial.
Assign the Healthcare Dataset Administrator role to the service account.
Activate your service account key:
gcloud auth activate-service-account --key-file=path-to-key-file
The output is the following:
Activated service account credentials for: [key-name@project-name.iam.gserviceaccount.com]
key-name
is the name that you assigned to the service account key.project-name
is the name of your Google Cloud project.
Using a medical image viewer
This tutorial uses the Mach7 diagnostic viewer as a medical image viewer. You can request a demonstration version of the viewer at the Mach7 website
To use this viewer, assign the Healthcare DICOM Viewer role to your user account by performing the following steps:
As an administrator in the Google Cloud console, go to the IAM page.
Click Addperson_add.
In the New principals field, enter your user account or your gmail address.
In the Select a role drop-down list, select Cloud Healthcare.
Hold your pointer over Cloud Healthcare and then select the Healthcare DICOM Viewer role.
Click Save.
To use the viewer for production purposes, you need to obtain a full version.
Obtaining an OAuth 2.0 access token
To use the Cloud Healthcare API to ingest data, you need an OAuth 2.0
access token that the commands in this tutorial obtain for you. In this
tutorial, some of the example Cloud Healthcare API requests use the curl
command-line tool. These examples use the gcloud auth print-access-token
command to obtain an OAuth 2.0 bearer token and to include the token in the
request's authorization header. For more information about this command, see
gcloud auth application-default print-access-token
.
This tutorial covers two of the most common use cases of removing identifying
information from DICOM data. In both cases, the solution is provided by using
either the curl
command-line tool or the Google Cloud CLI. For
more information about de-identifying DICOM data by using
Cloud Healthcare API, configuration options, and sample curl
and Windows
PowerShell commands, see De-identifying DICOM data.
Set up environment variables
This step applies to both use cases.
Export the environment variables based on the location and attributes of the DICOM store where your images are stored.
export PROJECT_ID=MyProj export REGION=us-central1 export SOURCE_DATASET_ID=dataset1 export DICOM_STORE_ID=dicomstore1 export DESTINATION_DATASET_ID=deid-dataset1
Use case I: Removing all metadata and redacting all burned-in text
This use case shows how to de-identify a dataset containing DICOM stores and DICOM data by removing all metadata (except for the minimum data required for a valid DICOM resource) and by redacting all burned-in text from DICOM images. You can do these functions:
- Create a
POST
request and provide the name of the destination dataset and an access token. - Remove all metadata and create a set of minimum
keepList
tags to have a valid DICOM resource. - Redact all sensitive burned-in text from the DICOM image by creating a
DeidentifyConfig
object withimage.text_redaction_mode
set toREDACT_ALL_TEXT
.
You can do these functions all in one curl
command like the following:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
--data "{
'destinationDataset': 'projects/$PROJECT_ID/locations/$REGION/datasets/$DESTINATION_DATASET_ID',
'config': {
'dicom': {'keepList': {
'tags': [
'StudyInstanceUID',
'SOPInstanceUID',
'TransferSyntaxUID',
'PixelData',
'Columns',
'NumberOfFrames',
'PixelRepresentation',
'MediaStorageSOPClassUID',
'MediaStorageSOPInstanceUID',
'Rows',
'SamplesPerPixel',
'BitsAllocated',
'HighBit',
'PhotometricInterpretation',
'BitsStored' ] }
},
'image': {
'textRedactionMode': 'REDACT_ALL_TEXT'
}
}
}" "https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID:deidentify"
Alternatively, you can complete the same de-identification operation without knowing or
specifying any tag name by using the MINIMAL_KEEP_LIST_PROFILE
tag filter
profile. See the following example:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
--data "{
'destinationDataset': 'projects/$PROJECT_ID/locations/$REGION/datasets/$DESTINATION_DATASET_ID',
'config': {
'dicom':{'filterProfile':'MINIMAL_KEEP_LIST_PROFILE'},
'image': {
'textRedactionMode': 'REDACT_ALL_TEXT'
}
}
}" "https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID:deidentify"
In all the preceding commands, if the request is successful, the server returns a response in JSON format, like the following:
{ "name": "projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/OPERATION_NAME" }
The response contains an operation name. You can use the operation name with the
Operation get
method
to track the status of the operation.
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
"https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME"
If the request is successful, the server returns a response in JSON format.
After the de-identification process completes, the response includes "done":
true
.
{ "name": "projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME", "metadata": { "@type": "type.googleapis.com/google.cloud.healthcare.v1.OperationMetadata", "apiMethodName": "google.cloud.healthcare.v1.dataset.DatasetService.DeidentifyDataset", "createTime": "2018-01-01T00:00:00Z", "endTime": "2018-01-01T00:00:00Z" }, "done": true, "response": { "@type": "...", "successStoreCount": "SUCCESS_STORE_COUNT" } }
You can also use the Google Cloud CLI to Google Cloud to run all
versions of the Cloud Healthcare API, including the de-identification API.
For a complete list of available commands, see the
Cloud Healthcare API gcloud
documentation or execute the following
command:
gcloud healthcare --help
The following sample shows how to use the gcloud CLI to de-identify a dataset containing DICOM stores and DICOM data in order to remove all metadata and redact all burned-in text from DICOM images.
gcloud healthcare datasets deidentify $SOURCE_DATASET_ID \
--location $REGION \
--dicom-filter-tags=StudyInstanceUID,SOPInstanceUID,TransferSyntaxUID,PixelData,Columns,NumberOfFrames,PixelRepresentation,MediaStorageSOPClassUID,MediaStorageSOPInstanceUID,Rows,SamplesPerPixel,BitsAllocated,HighBit,PhotometricInterpretation,BitsStored \
--text-redaction-mode all \
--destination-dataset projects/$PROJECT_ID/locations/$REGION/datasets/$DESTINATION_DATASET_ID \
--async
If the request is successful, the server returns a response like the following:
Request issued for: [$SOURCE_DATASET_ID] Check operation [OPERATION NAME] for status. name: projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME
To check the operation status, run the following command:
gcloud healthcare operations describe --dataset $SOURCE_DATASET_ID OPERATION_NAME
If the request is successful, the server returns a response like the following.
After the de-identification process completes, the response contains
"done": true
.
done: true metadata: '@type': type.googleapis.com/google.cloud.healthcare.v1.OperationMetadata apiMethodName: google.cloud.healthcare.v1.dataset.DatasetService.DeidentifyDataset "createTime": "2018-01-01T00:00:00Z", "endTime": "2018-01-01T00:00:00Z" name: "projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME" response: '@type': type.googleapis.com/google.cloud.healthcare.v1.dataset.DeidentifySummary successResourceCount: 'SUCCESS_RESOURCE_COUNT' successStoreCount: 'SUCCESS_STORE_COUNT'
Use case II: Modifying metadata and redacting sensitive burned-in text
This use case shows how to to de-identify a dataset containing DICOM stores and
DICOM data by using the filterProfile
tag filtering method to remove some
metadata, modify other metadata, and redact sensitive burned-in text associated
with images. The goal is to redact the PERSON_NAME
value, replace the
PHONE_NUMBER
value with asterisks, and modify DATE
and DATE_OF_BIRTH
to a
date value in the range of 100 days of the original values.
In this use case, the provided crypto key,
U2FsdGVkX19bS2oZsdbK9X5zi2utBn22uY+I2Vo0zOU=
, is an AES-encrypted 256 bit
base64-encoded key generated by using the following command. When prompted, an
empty password is provided to the command:
echo -n "test" | openssl enc -e -aes-256-ofb -a -salt
You can do these functions:
- Create a
POST
request and provide the name of the destination dataset and an access token. - Remove some metadata and modify other metadata in DICOM tags using the
DEIDENTIFY_TAG_CONTENT
filter profile with appropriate combinations of info types and primitive transformations. - Redact burned-in text from a DICOM image by setting
image.text_redaction_mode to
REDACT_SENSITIVE_TEXT
.
You can do these functions all in one curl
command like the following:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
--data "{
'destinationDataset': 'projects/$PROJECT_ID/locations/$REGION/datasets/$DESTINATION_DATASET_ID',
'config':{
'dicom':{'filterProfile':'DEIDENTIFY_TAG_CONTENTS'},
'text':{
'transformations':[
{'infoTypes':['PERSON_NAME'], 'redactConfig':{}},
{'infoTypes':['PHONE_NUMBER'], 'characterMaskConfig':{'maskingCharacter':''}},
{'infoTypes':['DATE', 'DATE_OF_BIRTH'], 'dateShiftConfig':{'cryptoKey':'U2FsdGVkX19bS2oZsdbK9X5zi2utBn22uY+I2Vo0zOU='}}]},
'image':{'textRedactionMode':'REDACT_SENSITIVE_TEXT'}}}" \
"https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID:deidentify"
If the request is successful, the server returns a response in JSON format like the following:
{ "name": "projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/OPERATION_NAME" }
The response contains an operation name. You can use the
Operation get
method
to track the status of the operation:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
"https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME"
If the request is successful, the server returns the following response in JSON format:
{ "name": "projects/$PROJECT_ID/locations/$REGION/datasets/$SOURCE_DATASET_ID/operations/OPERATION_NAME", "metadata": { "@type": "type.googleapis.com/google.cloud.healthcare.v1.OperationMetadata", "apiMethodName": "google.cloud.healthcare.v1.dataset.DatasetService.DeidentifyDataset", "createTime": "2018-01-01T00:00:00Z", "endTime": "2018-01-01T00:00:00Z" }, "done": true, "response": { "@type": "...", "successStoreCount": "SUCCESS_STORE_COUNT" } }
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the individual resources
Delete the destination datasets. If needed, add the
--location
parameter and specify the region for your dataset.gcloud healthcare datasets delete $DESTINATION_DATASET_ID
What's next
- De-identification of medical images through the Cloud Healthcare API
- De-identifying DICOM Data
- Healthcare API Beta Announcement
- Exporting DICOM Metadata to BigQuery
- For more information about DICOM capabilities, see the DICOM conformance statement.
For more information about the Cloud Healthcare API, including information about support for FHIR and HL7v2, see the Cloud Healthcare API documentation.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.