The 1000 Genomes dataset comprises roughly 2,500 genomes from 25 populations around the world. See the 1000 Genomes Project website and the following publications for full details:
Pilot publication: An integrated map of genetic variation from 1,092 human genomes
Phase 1 publication: A map of human genome variation from population scale sequencing
Phase 3 publications:
Dataset access
Cloud Storage folders
The following files are available in the genomics-public-data
Cloud Storage bucket:
- 1000 Genomes data: gs://genomics-public-data/1000-genomes
- 1000 Genomes Phase 3 data: gs://genomics-public-data/1000-genomes-phase-3
- A full mirror of https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/ is available in gs://genomics-public-data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/
BigQuery datasets
You can access the following datasets in BigQuery for data exploration and querying:
- Phase 3 variants: bigquery-public-data:human_genome_variants.1000_genomes_phase_3_variants_20150220
- Sample information: bigquery-public-data:human_genome_variants.1000_genomes_sample_info
- Pedigree: bigquery-public-data:human_genome_variants.1000_genomes_pedigree
About the dataset
Dataset source:
- The Phase 1 variants dataset is hosted by the EBI FTP site.
- The Phase 3 variants dataset is hosted by the EBI FTP site.
Use: These datasets are publicly available for anyone to use under the terms provided by the dataset source (http://www.internationalgenome.org/data) and are provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the datasets.