The Cancer Imaging Archive (TCIA) datasets

The Cancer Imaging Archive (TCIA) hosts collections of de-identified medical images, primarily in DICOM format. Collections are organized according to disease (such as lung cancer), image modality (such as MRI or CT), or research focus.

The Cloud Healthcare API provides access to these datasets via Google Cloud Platform (GCP), as described in GCP data access.

License and attribution

The TCIA public access datasets are available under the Creative Commons Attribution 3.0 Unported License. Most collections are "freely available to browse, download, and use for commercial, scientific and educational purposes." For details, see the TCIA Data Usage Policies and Restrictions.

Citations

For each collection you use, cite both the TCIA in general and the specific sources for the collection.

General citation

Cite the following general TCIA publication:

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)

Collection citations

Each TCIA collection has specific citation requirements. These may be data citations, publication citations, or both. Some collections also require attribution for additional data sources.

Details are available in the TCIA Attribution section. You can also refer to the citation and data usage policy on each collection summary page on the TCIA site.

GCP data access

You can get the TCIA datasets from Cloud Storage, BigQuery, or using the Cloud Healthcare API.

Cloud Storage

Each TCIA dataset is available in a Cloud Storage bucket within the Google Cloud Platform project named chc-tcia.

Go to the TCIA datasets in Cloud Storage

Dataset bucket names are in the following format:

gs://gcs-public-data--healthcare-tcia-DATASET_ID

To find the DATASET_ID, refer to the TCIA Attribution section. The last portion of the attribution page URL (immediately preceding .html) corresponds to the dataset ID. For example, the TCGA-BRCA citations page has the following URL:

https://cloud.google.com/healthcare/docs/resources/public-datasets/tcia-attribution/tcga-brca.html

The dataset ID is tcga-brca. The corresponding Cloud Storage bucket is:

gs://gcs-public-data--healthcare-tcia-tcga-brca

Within each bucket, the data is organized as follows:

gs://gcs-public-data--healthcare-DATASET/dicom/STUDY_UID/SERIES_UID/INSTANCE_UID.dcm

Each Cloud Storage bucket uses the "Requester Pays" model for billing. Your GCP project will be billed for the charges associated with accessing the NIH data. For more information, see Requester Pays.

BigQuery

Each TCIA dataset is available in BigQuery in the chc-tcia Google Cloud Platform project.

Go to the TCIA datasets in BigQuery

For information about accessing public data in BigQuery, see BigQuery public datasets.

Cloud Healthcare API

Each TCIA dataset is available in the Cloud Healthcare API in the chc-tcia project.

Go to the TCIA datasets in the Cloud Healthcare API

For information about the structure of the data, see the DICOM overview and Using the DICOMweb standard.

Data viewers

You can also use the IMS viewer with the Cloud Healthcare API:

https://cloudview.imstsvc.com

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Cloud Healthcare API