This page describes how to copy and store raw VCF files in Cloud Storage. After storing raw VCF files, you can use the Variant Transforms tool to load them into BigQuery.
Copying data into Cloud Storage
Cloud Life Sciences hosts a public dataset containing data from Illumina Platinum Genomes. To copy two VCF files from the dataset to your bucket:
gsutil cp \ gs://genomics-public-data/platinum-genomes/vcf/NA1287*_S1.genome.vcf \ gs://BUCKET/platinum-genomes/vcf/
Copying variants from a local file system
To copy a group of local files:
gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp *.vcf \ gs://BUCKET/vcf/
To copy a local directory of files:
gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -R \ VCF_FILE_DIRECTORY/ \ gs://BUCKET/vcf/
If any failures occur due to temporary network issues, you can re-run the
previous commands using the no-clobber (-n
) flag, which copies only the
missing files:
gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -n -R \ VCF_FILE_DIRECTORY \ gs://BUCKET/vcf/
For more information on copying data to Cloud Storage, see Using Cloud Storage with Big Data.
What's next
Use the Variant Transforms tool to load your VCF files into BigQuery.