Reference Genomes, such as GRCh37, GRCh37lite, GRCh38, hg19, hs37d5, and b37, are available on Google Cloud Platform.
Cloud Storage folders
The following files are available in the
Cloud Storage bucket:
Cloud Genomics API access
You can use the Cloud Genomics Pipelines API to access the following datasets:
- [Cloud Genomics reference sets]
About the dataset
GRCh37: Genome Reference Consortium Human Build 37 includes data from the following files:
GRCh37lite: GRCh37lite is a subset of the full GRCh37 reference set plus the human mitochondrial genome reference sequence in one file:
For more information on GRCh37lite data, see the FTP README.
GRCh38: Genome Reference Consortium Human Build 38 includes data from the following files:
Verily's GRCh38: Verily’s GRCh38 reference genome is fully compatible with any b38 genome in the autosome. It has the following features:
- Excludes all patch sequences
- Omits alternate haplotype chromosomes
- Includes decoy sequences
- Masks out duplicate copies of centromeric regions
Verily applied the following modifications to the base assembly:
Reference segment names are prefixed with
chr. Many of the additional data files are provided by GENCODE, which uses the "chr" naming convention.
All 74 extended IUPAC codes are converted to the first matching alphabetical base pair as recommended in the VCF 4.3 specification.
This release of the genome reference is named
hg19: Similar to GRCh37, this is the February 2009 assembly of the human genome with a different mitochondrial sequence and additional alternate haplotype assemblies. The hg19 data is hosted by the UCSC FTP site.
For more information on hg19 data, see the FTP README.
For more information on hs37d5 data, see the FTP README.
b37: The b37 reference genome is included by some versions of the GATK software, which includes data from GRCh37, the rCRS mitochondrial sequence, and the Human herpesvirus 4 type 1. The b37 dataset is hosted by the Broad Institute FTP site.
For more information on b37 data, see the GATK FAQs.
Use: These datasets are publicly available for anyone to use under the terms provided by the dataset sources (https://www.ncbi.nlm.nih.gov/, https://cse.ucsc.edu/, http://www.internationalgenome.org/data, https://www.broadinstitute.org/) and are provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the datasets.