Cloud Life Sciences provides a variety of public datasets that you can access for free and integrate into your applications. Google hosts these datasets, providing public access to the data through the following methods:
Interactive access is available in the BigQuery console. You can explore variant calls in case/control and cohort analysis. There are sample queries to help you get started. For information on how to get started with public datasets in BigQuery, see BigQuery public datasets.
File access is available from Cloud Storage. Files are available in BAM, VCF, and FASTA formats. Copy the files you need to your local disk or a Compute Engine VM for access from your favorite bioinformatics tools. For information on how to get started with Cloud Storage, see How to use public datasets on Cloud Storage.
For public data hosted by the community on Google, each data provider determines the modes of access they support.
Cloud Life Sciences genomic public datasets
- 1000 Cannabis Genomes Project
- 1000 Genomes
- Genome Aggregation Database (gnomAD)
- Illumina Platinum Genomes
- MSSNG Database for Autism Researchers
- Reference Genomes
- Simons Genome Diversity Project
- The Cancer Genome Atlas (TCGA)
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
Cloud Life Sciences annotation public datasets
List your public dataset on Cloud Storage
If you have questions about listing a public dataset on Cloud Storage, contact us at firstname.lastname@example.org.
List your public data set on BigQuery
If you have questions about listing a public data set in BigQuery, contact us at email@example.com.