Storing Raw VCF Files in Cloud Storage

This page describes how to copy and store raw VCF files in Cloud Storage. After storing raw VCF files, you can use the Variant Transforms tool to load them into BigQuery.

Copying data into Cloud Storage

Cloud Genomics hosts a public dataset containing data from Illumina Platinum Genomes. To copy two VCF files from the dataset to your bucket:

gsutil cp \
    gs://genomics-public-data/platinum-genomes/vcf/NA1287*_S1.genome.vcf \
    gs://BUCKET/platinum-genomes/vcf/

Copying variants from a local file system

To copy a group of local files:

gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp *.vcf \
    gs://BUCKET/vcf/

To copy a local directory of files:

gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -R \
    VCF_FILE_DIRECTORY/ \
    gs://BUCKET/vcf/

If any failures occur due to temporary network issues, you can re-run the previous commands using the no-clobber (-n) flag, which copies only the missing files:

gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -n -R \
    VCF_FILE_DIRECTORY \
    gs://BUCKET/vcf/

For more information on copying data to Cloud Storage, see Using Cloud Storage with Big Data.

What's next

Use the Variant Transforms tool to load your VCF files into BigQuery.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Genomics