Exporting FHIR resources to BigQuery

Stay organized with collections Save and categorize content based on your preferences.

This page explains how to export FHIR resources to BigQuery for exploration and analysis. The export operation outputs one BigQuery table for each FHIR resource type in the FHIR store.

To export FHIR resources to partitioned tables in BigQuery, see Configure BigQuery export to partitioned tables on a FHIR store (Preview). Using partitioned tables can improve query performance and reduce costs by reducing the number of bytes read by a query.

Setting BigQuery permissions

Before exporting FHIR resources to BigQuery, you must grant additional permissions to the Cloud Healthcare Service Agent service account. For more information, see FHIR store BigQuery permissions.

Exporting FHIR resources

The following samples show how to export FHIR resources to a BigQuery table.

When setting the BigQuery destination, use the fully qualified URI, like so:

bq://PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_ID

The behavior of the export operation can vary depending on the following:

  • Whether the destination table already exists.
  • Whether you set the force field.
  • Whether you specify an enum in WriteDisposition. If you specify an enum, do not set the force field.

The behavior in each of these cases is as follows:

  • If the destination table already exists and force is set to true, the export operation overwrites the existing table.
  • If the destination table already exists and force is set to false, an error occurs.
  • If the destination table does not already exist then the export operation creates a new table, regardless of whether you specify the force field.
  • When using WriteDisposition, if the destination table already exists and is empty, the export operation completes successfully instead of returning an error.

The operation outputs one BigQuery table for each resource type in the FHIR store.

Console

To export FHIR resources to BigQuery using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the Datasets page.

    Go to Datasets

  2. Click the dataset that has the FHIR store with the data you're exporting.

  3. In the same row as the FHIR store, open the Actions list and select Export.

  4. On the Export FHIR resources page that appears, find the Select a destination section. Select BigQuery table.

  5. In the Destination table write disposition section, select one of the following options to determine the behavior of the export operation:

    • Only export data if the destination tables are empty: this is equivalent to selecting the WRITE_EMPTY enum in WriteDisposition.
    • Append data to the destination tables: this is equivalent to selecting the WRITE_APPEND enum in WriteDisposition.
    • Erase all existing data in destination tables before writing the FHIR resources: this is equivalent to selecting the WRITE_TRUNCATE enum in WriteDisposition.
  6. In the FHIR export configuration section, click Browse to select the BigQuery project and dataset.

  7. In the Schema type dropdown, select the output schema for the BigQuery table. The following schemas are available:

    • Analytics. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the Parameters.parameter.resource, Bundle.entry.resource, and Bundle.entry.response.outcome fields.
    • Analytics V2. A schema similar to the Analytics schema, with added support for the following:
      • FHIR extensions with more than one occurrences with the same extension URL
      • Contained FHIR resources in string-converted JSON.
      The Analytics V2 schema uses more space in the destination table than the Analytics schema.

  8. Select a depth level in the Recursive Structure Depth slider to set the depth for all recursive structures in the output schema. By default, the recursive value is 2.

    For more information, see recursiveStructureDepth.

  9. Click Export to export FHIR resources to BigQuery.

  10. To track the status of the operation, click the Operations tab. After the operation completes, the following indications appear:
    • The Long-running operation status section has a green check mark under the OK heading.
    • The Overview section has a green check mark and an OK indicator in the same row as the operation ID.
    If you encounter any errors, click Actions, and then click View details in Cloud Logging.

gcloud CLI

To export FHIR resources to BigQuery, run the gcloud healthcare fhir-stores export bq command.

  1. Export the FHIR resources.

    Before using any of the command data below, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • LOCATION: the dataset location
    • DATASET_ID: the FHIR store's parent dataset
    • FHIR_STORE_ID: the FHIR store ID
    • BIGQUERY_DATASET_ID: the name of the existing BigQuery dataset where you're exporting FHIR resources
    • SCHEMA_TYPE: a value for SchemaType. Use one of the following values:
      • analytics. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the Parameters.parameter.resource, Bundle.entry.resource, and Bundle.entry.response.outcome fields.
      • analytics_v2. A schema similar to analytics with added support for FHIR extensions with one or more occurrences and added support for contained FHIR resources in string-converted JSON. analytics-v2 uses more space in the destination table than analytics.
    • WRITE_DISPOSITION: a value for WriteDisposition. Use one of the following values:
      • write-empty. Only export data if the destination BigQuery tables are empty.
      • write-truncate. Erase all existing data in the BigQuery tables before writing the FHIR resources.
      • write-append. Append data to the destination BigQuery tables.
    • TYPE: an optional field. Specify one or more FHIR resource types to only export FHIR resources of those types.
    • SINCE: an optional field. Specify a value in the format YYYY-MM-DDTHH:MM:SS+ZZ:ZZ to only export FHIR resources updated after a specific time.

    Execute the following command:

    Linux, macOS, or Cloud Shell

    gcloud healthcare fhir-stores export bq FHIR_STORE_ID \
      --location=LOCATION \
      --dataset=DATASET_ID \
      --bq-dataset=bq://PROJECT_ID.BIGQUERY_DATASET_ID \
      --schema-type=SCHEMA_TYPE \
      --write-disposition=WRITE_DISPOSITION \
      [--type=TYPE] \
      [--since=SINCE]
    

    Windows (PowerShell)

    gcloud healthcare fhir-stores export bq FHIR_STORE_ID `
      --location=LOCATION `
      --dataset=DATASET_ID `
      --bq-dataset=bq://PROJECT_ID.BIGQUERY_DATASET_ID `
      --schema-type=SCHEMA_TYPE `
      --write-disposition=WRITE_DISPOSITION `
      [--type=TYPE] `
      [--since=SINCE]
    

    Windows (cmd.exe)

    gcloud healthcare fhir-stores export bq FHIR_STORE_ID ^
      --location=LOCATION ^
      --dataset=DATASET_ID ^
      --bq-dataset=bq://PROJECT_ID.BIGQUERY_DATASET_ID ^
      --schema-type=SCHEMA_TYPE ^
      --write-disposition=WRITE_DISPOSITION ^
      [--type=TYPE] ^
      [--since=SINCE]
    
    The response is the following. The response contains an identifier for a long-running operation. Long-running operations are returned when method calls might take a substantial amount of time to complete. The command polls the long-running operation, then prints the name of the name of the operation in the name field after the export finishes. Note the value of OPERATION_ID. You need this value in the next step.

    Response

    Request issued for: [FHIR_STORE_ID]
    Waiting for operation [projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID] to complete...⠏
    name: projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID
    

  2. To view more details about the operation, run the gcloud healthcare operations describe and provide the OPERATION_ID from the response.

    Before using any of the command data below, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • DATASET_ID: the dataset ID
    • LOCATION: the dataset location
    • OPERATION_ID: the ID returned from the long-running operation

    Execute the following command:

    Linux, macOS, or Cloud Shell

    gcloud healthcare operations describe OPERATION_ID \
        --project=PROJECT_ID \
        --dataset=DATASET_ID \
        --location=LOCATION
    

    Windows (PowerShell)

    gcloud healthcare operations describe OPERATION_ID `
        --project=PROJECT_ID `
        --dataset=DATASET_ID `
        --location=LOCATION
    

    Windows (cmd.exe)

    gcloud healthcare operations describe OPERATION_ID ^
        --project=PROJECT_ID ^
        --dataset=DATASET_ID ^
        --location=LOCATION
    

    You should receive a response similar to the following:

    Response

    done: true
    // If there were any errors, an `error` field displays instead of a `response` field.
    // See Troubleshooting long-running operations for a list of response codes.
    error: ERROR
      code: ERROR_CODE
      message: DESCRIPTION
    metadata:
      '@type': 'type.googleapis.com/google.cloud.healthcare.v1.OperationMetadata'
      apiMethodName: 'google.cloud.healthcare.v1.fhir.FhirStoreService.ExportResources_bq'
      counter:
        success: 'SUCCESS_COUNT'
        // If there were any failures, they display in the `failure` field.
        failure: 'FAILURE_COUNT'
      createTime: 'YYYY-MM-DDTHH:MM:SS+ZZ:ZZ'
      endTime: 'YYYY-MM-DDTHH:MM:SS+ZZ:ZZ'
      logsUrl: https://console.cloud.google.com/CLOUD_LOGGING_URL
    name: projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID
    // The `response` field only displays if there were no errors.
    response:
      '@type': 'type.googleapis.com/google.cloud.healthcare.v1.fhir.ExportResourcesResponse'
    

REST

To export FHIR resources to BigQuery, use the projects.locations.datasets.fhirStores.export method.

  1. Export the FHIR resources:

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • LOCATION: the dataset location
    • DATASET_ID: the FHIR store's parent dataset
    • FHIR_STORE_ID: the FHIR store ID
    • BIGQUERY_DATASET_ID: the name of the existing BigQuery dataset where you're exporting FHIR resources
    • SCHEMA_TYPE: a value for SchemaType. Use one of the following values:
      • ANALYTICS. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the Parameters.parameter.resource, Bundle.entry.resource, and Bundle.entry.response.outcome fields.
      • ANALYTICS_V2. A schema similar to ANALYTICS with added support for FHIR extensions with one or more occurrences and added support for contained FHIR resources in string-converted JSON. ANALYTICS_V2 uses more space in the destination table than ANALYTICS.
    • WRITE_DISPOSITION: a value for WriteDisposition. Use one of the following values:
      • WRITE_EMPTY. Only export data if the destination BigQuery tables are empty.
      • WRITE_TRUNCATE. Erase all existing data in the BigQuery tables before writing the FHIR resources.
      • WRITE_APPEND. Append data to the destination BigQuery tables.
    • TYPE: an optional field. Specify one or more FHIR resource types to only export FHIR resources of those types.
    • SINCE: an optional field. Specify a value in the format YYYY-MM-DDTHH:MM:SS+ZZ:ZZ to only export FHIR resources updated after a specific time.

    Request JSON body:

    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
        },
        "writeDisposition": "WRITE_DISPOSITION"
      },
      "_since": "SINCE",
      "_type": "TYPE"
    }
    

    To send your request, choose one of these options:

    curl

    Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    cat > request.json << 'EOF'
    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
        },
        "writeDisposition": "WRITE_DISPOSITION"
      },
      "_since": "SINCE",
      "_type": "TYPE"
    }
    EOF

    Then execute the following command to send your REST request:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID:export"

    PowerShell

    Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    @'
    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
        },
        "writeDisposition": "WRITE_DISPOSITION"
      },
      "_since": "SINCE",
      "_type": "TYPE"
    }
    '@  | Out-File -FilePath request.json -Encoding utf8

    Then execute the following command to send your REST request:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID:export" | Select-Object -Expand Content

    API Explorer

    Copy the request body and open the method reference page. The API Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Paste the request body in this tool, complete any other required fields, and click Execute.

    The output is the following. The response contains an identifier for a long-running operation. Long-running operations are returned when method calls might take additional time to complete. Note the value of OPERATION_ID. You need this value in the next step.

  2. Use the projects.locations.datasets.operations.get method to get the status of the long-running operation.

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • DATASET_ID: the dataset ID
    • LOCATION: the dataset location
    • OPERATION_ID: the ID returned from the long-running operation

    To send your request, choose one of these options:

    curl

    Execute the following command:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID"

    PowerShell

    Execute the following command:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID" | Select-Object -Expand Content

    API Explorer

    Open the method reference page. The API Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Complete any required fields and click Execute.

    The output is the following. When the response contains "done": true, the long-running operation has finished.

Export FHIR resources to partitioned tables

To export FHIR resources to BigQuery partitioned tables, configure the TimePartitioning enum in the lastUpdatedPartitionConfig field in your FHIR store.

The partitioned tables behave similarly to BigQuery time-unit partitioned tables. When a partitioned table is created, a new column named lastUpdated is added to the existing schema. The lastUpdated column is a duplicate of the meta.lastUpdated column, and serves as the criteria that BigQuery uses to partition tables.

For example, if you set the TimePartitioning enum to DAY, then tables are partitioned per day. The value in lastUpdated determines which day the FHIR resource is partitioned into.

The lastUpdated column is a TIMESTAMP value, which means that the partitions can have hourly, daily, monthly, or yearly granularity. See Choose daily, hourly, monthly, or yearly partitioning for recommendations on choosing a granularity.

You can't convert existing, non-partitioned BigQuery tables into partitioned tables. For example, suppose that you've exported Patient resource changes to an existing non-partitioned Patients table in a BigQuery dataset named exported-resources.

Next, suppose that you create a new FHIR store and use the same BigQuery destination dataset, exported-resources. You also configure table partitioning on the new FHIR store during export. In this case, any exported Patient resource data in the new FHIR store is still exported to the non-partitioned Patients table. If you want to start using partitioned tables, either delete the existing Patients table or use a different BigQuery dataset.

If you update a FHIR store configuration to include a TimePartitioning configuration, and you export FHIR resources to existing non-partitioned tables, the FHIR store configuration is still valid. No errors occur, but TimePartitioning doesn't take effect; it only takes effect on new tables.

The following samples show how to export FHIR resources to BigQuery partitioned tables.

Console

The Google Cloud console and the gcloud CLI don't support this action. Instead, use curl, PowerShell, or your preferred language.

gcloud CLI

The Google Cloud console and the gcloud CLI don't support this action. Instead, use curl, PowerShell, or your preferred language.

REST

To export FHIR resources to BigQuery partitioned tables, use the projects.locations.datasets.fhirStores.export method.

  1. Export the FHIR resources:

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • LOCATION: the dataset location
    • DATASET_ID: the FHIR store's parent dataset
    • FHIR_STORE_ID: the FHIR store ID
    • BIGQUERY_DATASET_ID: the name of the existing BigQuery dataset where you're exporting FHIR resources
    • SCHEMA_TYPE: a value for SchemaType. Use one of the following values:
      • LOSSLESS. A data-driven schema generated from the fields in the exported FHIR data without additional simplification.
      • ANALYTICS. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the Parameters.parameter.resource, Bundle.entry.resource, and Bundle.entry.response.outcome fields.
      • ANALYTICS_V2. A schema similar to ANALYTICS with added support for FHIR extensions with one or more occurrences and added support for contained FHIR resources in string-converted JSON.
    • TIME_PARTITION_TYPE: the granularity at which to partition exported FHIR resources. Use one of the following values:
      • HOUR: partition data by hour
      • DAY: partition data by day
      • MONTH: partition data by month
      • YEAR: partition data by year
    • WRITE_DISPOSITION: a value for WriteDisposition. Use one of the following values:
      • WRITE_EMPTY. Only export data if the destination BigQuery tables are empty.
      • WRITE_TRUNCATE. Erase all existing data in the BigQuery tables before writing the FHIR resources.
      • WRITE_APPEND. Append data to the destination BigQuery tables. If you're exporting FHIR resources to BigQuery partitioned tables for the first time, existing tables in the BigQuery dataset remain unpartitioned. The partitioning configuration only takes effect on new tables.

    Request JSON body:

    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
          "lastUpdatedPartitionConfig": {
            "type": "TIME_PARTITION_TYPE"
          }
        },
        "writeDisposition": "WRITE_DISPOSITION"
      }
    }
    

    To send your request, choose one of these options:

    curl

    Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    cat > request.json << 'EOF'
    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
          "lastUpdatedPartitionConfig": {
            "type": "TIME_PARTITION_TYPE"
          }
        },
        "writeDisposition": "WRITE_DISPOSITION"
      }
    }
    EOF

    Then execute the following command to send your REST request:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID:export"

    PowerShell

    Save the request body in a file called request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    @'
    {
      "bigqueryDestination": {
        "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID",
        "schemaConfig": {
          "schemaType": "SCHEMA_TYPE",
          "lastUpdatedPartitionConfig": {
            "type": "TIME_PARTITION_TYPE"
          }
        },
        "writeDisposition": "WRITE_DISPOSITION"
      }
    }
    '@  | Out-File -FilePath request.json -Encoding utf8

    Then execute the following command to send your REST request:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID:export" | Select-Object -Expand Content

    API Explorer

    Copy the request body and open the method reference page. The API Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Paste the request body in this tool, complete any other required fields, and click Execute.

    You should receive a JSON response similar to the following:

  2. Use the projects.locations.datasets.operations.get method to get the status of the long-running operation.

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: the ID of your Google Cloud project
    • DATASET_ID: the dataset ID
    • LOCATION: the dataset location
    • OPERATION_ID: the ID returned from the long-running operation

    To send your request, choose one of these options:

    curl

    Execute the following command:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID"

    PowerShell

    Execute the following command:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://healthcare.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID" | Select-Object -Expand Content

    API Explorer

    Open the method reference page. The API Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Complete any required fields and click Execute.

    The output is the following. When the response contains "done": true, the long-running operation has finished.

Query a partitioned table

To reduce query costs when querying partitioned tables, use the WHERE clause to filter by time units.

For example, suppose that you set the PartitionType enum to DAY. To query a Patients table for Patient resources that were updated on a specific date, run the following query:

SELECT * FROM `PROJECT_ID.BIGQUERY_DATASET.Patients`
  WHERE DATE(lastUpdated) = 'YYYY-MM-DD'

Querying and analyzing FHIR data in BigQuery

After exporting the FHIR resources to BigQuery, see the Analyzing FHIR data in BigQuery solution for information on how to query and analyze the exported data. The solution uses the Synthea™ Generated Synthetic Data in FHIR public dataset, which hosts over 1 million synthetic patient records generated in the Synthea™ and FHIR formats.

Troubleshooting FHIR export requests

If errors occur during a FHIR export request, the errors are logged to Cloud Logging. For more information, see Viewing error logs in Cloud Logging.

If the entire operation returns an error, see Troubleshooting long-running operations.

What's next