Store raw VCF files in Cloud Storage

This page describes how to copy and store raw VCF files in Cloud Storage. After storing raw VCF files, you can use the Variant Transforms tool to load them into BigQuery.

Copy data into Cloud Storage

Cloud Life Sciences hosts a public dataset containing data from Illumina Platinum Genomes. To copy two VCF files from the dataset to your bucket, use the gcloud storage cp command:

gcloud storage cp \
    gs://genomics-public-data/platinum-genomes/vcf/NA1287*_S1.genome.vcf \
    gs://BUCKET/platinum-genomes/vcf/

Replace BUCKET with the name of your Cloud Storage bucket.

Copying variants from a local file system

To copy a group of local files in your current directory, run the gcloud storage cp command:

gcloud storage cp *.vcf gs://BUCKET/vcf/

Replace BUCKET with the name of your Cloud Storage bucket.

To copy a local directory of files, run the following command:

gcloud storage cp VCF_FILE_DIRECTORY/ gs://BUCKET/vcf/ --recursive

Replace the following:

  • VCF_FILE_DIRECTORY: the path to the local directory containing VCF files
  • BUCKET: the name of your Cloud Storage bucket

If any failures occur due to temporary network issues, you can re-run the previous commands using the no-clobber (-n) flag, which copies only the missing files:

gcloud storage cp VCF_FILE_DIRECTORY/ gs://BUCKET/vcf/ \
    --recursive --no-clobber

Replace the following:

  • VCF_FILE_DIRECTORY: the path to the local directory containing VCF files
  • BUCKET: the name of your Cloud Storage bucket

For more information on copying data to Cloud Storage, see Using Cloud Storage with Big Data.