Streaming DICOM metadata to BigQuery

Stay organized with collections Save and categorize content based on your preferences.

This page explains how configure a DICOM store to export DICOM instance metadata to a BigQuery table each time a DICOM instance is inserted into or deleted from a DICOM store. The DICOM instances can be inserted using either the Store transaction or by importing from Cloud Storage. The DICOM instances can be deleted using a Deletion request.

You can use BigQuery streaming to synchronize the data in a DICOM store with a BigQuery dataset in near real time. You can perform complex queries on DICOM data without needing to export the latest version of your DICOM store to BigQuery every time you want to analyze your data.

Before configuring streaming, view Exporting DICOM metadata to BigQuery for an overview of how exporting DICOM metadata to BigQuery works.

Setting BigQuery permissions

Before exporting DICOM metadata to BigQuery, you must grant extra permissions to the Cloud Healthcare Service Agent service account. For more information, see DICOM store BigQuery permissions.

Configuring the DICOM store

To enable streaming to BigQuery, configure the StreamConfig object in your DICOM store. Inside of StreamConfig, configure BigQueryDestination to include a fully qualified BigQuery table URI where DICOM instance metadata will be streamed. StreamConfig is an array, so you can specify multiple BigQuery destinations. You can stream metadata from a single DICOM store to up to five BigQuery tables in a BigQuery dataset.

When you delete DICOM instances in the Cloud Healthcare API, BigQuery rows containing the metadata for those instances are not deleted.

The following samples show how to update a DICOM store to enable BigQuery streaming. In these samples, the DICOM store and the BigQuery table are in the same project. To export DICOM metadata to another project, see Exporting DICOM metadata to a different project.

REST & CMD LINE

Before using any of the request data, make the following replacements:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the dataset location
  • DATASET_ID: the DICOM store's parent dataset
  • DICOM_STORE_ID: the DICOM store ID
  • BIGQUERY_DATASET_ID: the name of an existing BigQuery dataset
  • BIGQUERY_TABLE_ID: a unique name for a table in the BigQuery dataset. See Table naming for naming requirements. The BigQuery dataset must exist, but the Cloud Healthcare API can update an existing table or create a new one.

Request JSON body:

{
  'streamConfigs': [{
     'bigqueryDestination': {
      'tableUri': 'bq://PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_ID'
     }
  }]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

cat > request.json << 'EOF'
{
  'streamConfigs': [{
     'bigqueryDestination': {
      'tableUri': 'bq://PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_ID'
     }
  }]
}
EOF

Then execute the following command to send your REST request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/dicomStores/DICOM_STORE_ID?updateMask=streamConfigs"

PowerShell

Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

@'
{
  'streamConfigs': [{
     'bigqueryDestination': {
      'tableUri': 'bq://PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_ID'
     }
  }]
}
'@  | Out-File -FilePath request.json -Encoding utf8

Then execute the following command to send your REST request:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/dicomStores/DICOM_STORE_ID?updateMask=streamConfigs" | Select-Object -Expand Content

API Explorer

Copy the request body and open the method reference page. The API Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Paste the request body in this tool, complete any other required fields, and click Execute.

You should receive a JSON response similar to the following:

Deletion metadata

In previous Cloud Healthcare API versions, DICOM instance metadata was only exported to BigQuery when a DICOM instance was inserted into a DICOM store. When writing metadata for deletions was added, two new columns, named Type and LastUpdated, were added to the generated table containing the DICOM metadata.

Any metadata in the table that existed before the introduction of deletion metadata has a NULL value. A NULL is the minimum possible value and appears last when sorting by descending order.

Generated BigQuery view

Each time you insert or delete a DICOM instance in a DICOM store, the table in the configured BigQuery dataset is updated. A view of the table is created if the view doesn't exist, or updated if the view does exist.

Limitations and additional behavior

  • If either of the following occurs, DICOM tags will be listed in a separate column (called DroppedTags.TagName) in the destination BigQuery table:

    • The size of the DICOM tag is equal to or greater than 1 MB
    • The DICOM tag does not have a supported type in BigQuery (listed in Excluded VRs)

Incorporating deletion metadata into an existing table

The behavior of the view can vary depending on whether the table that the view is based on contains metadata that was added before the introduction of deletion metadata.

Consider the following scenario, where DICOM metadata existed in a table in BigQuery before the introduction of deletion metadata, and then the following occurs:

  1. You insert a DICOM instance into a DICOM store.
  2. You delete the DICOM instance from the DICOM store.
  3. You edit the tags of the original DICOM instance, and insert it into the DICOM store again.

Because the original metadata existed before the introduction of the LastUpdated column, the original DICOM instance and the version with the edited tags each appear in the table with the same studies, series, and instance unique identifiers (UID). The view that is generated might contain either the original DICOM instance or the most recent DICOM instance. Because there's no LastUpdated column, the view cannot determine which DICOM instance is the newly inserted one.

To work around this issue, do one of the following:

  • Do not query the view. Instead, query the table containing the DICOM metadata, and ensure that the query contains a search for the edited tags in the newly inserted DICOM instance.
  • Delete the existing table containing DICOM metadata, and then recreate it by exporting the DICOM metadata to BigQuery manually. When you recreate the table, the newly created table contains the LastUpdated column. The column contains a creation value for each instance based on when the instance was added to the DICOM store.

    This option removes historical streaming metadata, but ensures that the table correctly contains the LastUpdated column with valid values.

Troubleshooting DICOM streaming requests

If errors occur during a DICOM export metadata to BigQuery request, the errors are logged to Cloud Logging. For more information, see Viewing error logs in Cloud Logging.

To filter logs related to streaming DICOM metadata, select healthcare.googleapis.com%2Fdicom_stream from the second list under Filter by label or text search.