Cloud Genomics provides a variety of public datasets that you can access for free and integrate into your applications. Google hosts these datasets, providing public access to the data via the following methods:
Interactive access is available in the BigQuery console. You can explore variant calls in case/control and cohort analysis. There are sample queries to help you get started. For information on how to get started with BigQuery, see How to query datasets using BigQuery.
File access is available from Cloud Storage. Files are available in BAM, VCF, and FASTA formats. Copy the files you need to local disk or a Compute Engine VM for access from your favorite bioinformatics tools. For information on how to get started with Cloud Storage, see How to use public datasets on Cloud Storage.
For public data hosted by the community on Google, each data provider determines the modes of access they support.
Cloud Genomics genomic public datasets
- 1000 Cannabis Genomes Project
- 1000 Genomes
- Illumina Platinum Genomes
- MSSNG Database for Autism Researchers
- Reference Genomes
- Simons Genome Diversity Project
- The Cancer Genome Atlas (TCGA)
Cloud Genomics annotation public datasets
List your public dataset on Cloud Storage
If you have questions about listing a public dataset on Cloud Storage, contact us at firstname.lastname@example.org.
List your public data set on BigQuery
If you have questions about listing a public data set in BigQuery, contact us at email@example.com.