The Cancer Imaging Archive (TCIA) hosts collections of de-identified medical images, primarily in DICOM format. Collections are organized according to disease (such as lung cancer), image modality (such as MRI or CT), or research focus.
The Cloud Healthcare API provides access to these datasets via Google Cloud (GCP), as described in Google Cloud data access.
License and attribution
The TCIA public access datasets are available under the Creative Commons Attribution 3.0 Unported License. Most collections are "freely available to browse, download, and use for commercial, scientific and educational purposes." For details, see the TCIA Data Usage Policies and Restrictions.
Citations
For each collection you use, cite both the TCIA in general and the specific sources for the collection.
General citation
Cite the following general TCIA publication:
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)
Collection citations
Each TCIA collection has specific citation requirements. These may be data citations, publication citations, or both. Some collections also require attribution for additional data sources.
Details are available in the TCIA Attribution section. You can also refer to the citation and data usage policy on each collection summary page on the TCIA site.
Accessing the TCIA datasets
You can get the TCIA datasets from Cloud Storage, BigQuery, or using the Cloud Healthcare API.
Cloud Storage
Each TCIA dataset is available in a Cloud Storage bucket within
the Google Cloud project named chc-tcia
.
Go to the TCIA datasets in Cloud Storage
Dataset bucket names are in the following format:
gs://gcs-public-data--healthcare-tcia-DATASET_ID
To find the DATASET_ID, refer to the TCIA
Attribution
section. The last portion of the attribution page URL (immediately preceding
.html
) corresponds to the dataset ID. For example, the
TCGA-BRCA citations page has the
following URL:
https://cloud.google.com/healthcare/docs/resources/public-datasets/tcia-attribution/tcga-brca.html
The dataset ID is tcga-brca
. The corresponding Cloud Storage bucket is:
gs://gcs-public-data--healthcare-tcia-tcga-brca
Within each bucket, the data is organized as follows:
gs://gcs-public-data--healthcare-DATASET/dicom/STUDY_UID/SERIES_UID/INSTANCE_UID.dcm
Each Cloud Storage bucket uses the "Requester Pays" model for billing. Your Google Cloud project will be billed for the charges associated with accessing the TCIA data. For more information, see Requester Pays.
BigQuery
Each TCIA dataset is available in BigQuery in
the chc-tcia
Google Cloud project.
Go to the TCIA datasets in BigQuery
For information about accessing public data in BigQuery, see BigQuery public datasets.
Cloud Healthcare API
Each TCIA dataset is available in the Cloud Healthcare API in the chc-tcia
project.
To request access to the TCIA datasets, complete this form.
Go to the TCIA datasets in the Cloud Healthcare API
For information about the structure of the data, see the DICOM overview and Using the DICOMweb standard.
External data viewers
You can also use the viewers that are integrated with the Cloud Healthcare API:
eUnity: https://demo.eunity.app
IMS CloudVue: https://cloudvue.imstsvc.com