The Cancer Genome Atlas data

The Cancer Genome Atlas (TCGA) program was a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. Data generated from the program molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 different cancer types.

The Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC) provides access to TCGA data and metadata in BigQuery tables for ease of access and analysis. These tables consolidate the information scattered over tens of thousands of XML and tabular open-access TCGA data into a queryable format by data type (for example, clinical, biospecimen, gene expression, and mutation) for ease of access and analysis.

Similarly, ISB-CGC has created BigQuery tables for other cancer programs; see the ISB-CGC Programs documentation.

ISB-CGC also provides notebook examples in both R and Python that range from simple to complex query building and analysis using ISB-CGC BigQuery tables:

Dataset access

Cloud Storage folders

ISB-CGC stores cloud storage paths to TCGA data hosted by the National Cancer Institute's Genomic Data Commons in the BigQuery dataset isb-cgc-bq.GDC_case_file_metadata. Please see the ISB-CGC TCGA documentation to find out how to access these file locations.

BigQuery datasets

You can access the following TCGA datasets in BigQuery for data exploration and querying:

To explore other ISB-CGC cancer datasets, use the ISB-CGC BigQuery Search Tool. You can find this data in the isb-cgc-bq project in Google BigQuery. For more information about ISB-CGC and its data, see ISB-CGC documentation.

About the data

