This page describes how to copy and store raw VCF files in Cloud Storage. After storing raw VCF files, you can use the Variant Transforms tool to load them into BigQuery.
Copy data into Cloud Storage
Cloud Life Sciences hosts a public dataset containing data from
Illumina Platinum Genomes.
To copy two VCF files from the dataset to your bucket, use the
gcloud storage cp
command:
gcloud storage cp \ gs://genomics-public-data/platinum-genomes/vcf/NA1287*_S1.genome.vcf \ gs://BUCKET/platinum-genomes/vcf/
Replace BUCKET with the name of your Cloud Storage bucket.
Copying variants from a local file system
To copy a group of local files in your current directory, run the gcloud storage cp
command:
gcloud storage cp *.vcf gs://BUCKET/vcf/
Replace BUCKET with the name of your Cloud Storage bucket.
To copy a local directory of files, run the following command:
gcloud storage cp VCF_FILE_DIRECTORY/ gs://BUCKET/vcf/ --recursive
Replace the following:
- VCF_FILE_DIRECTORY: the path to the local directory containing VCF files
- BUCKET: the name of your Cloud Storage bucket
If any failures occur due to temporary network issues, you can re-run the
previous commands using the no-clobber (-n
) flag, which copies only the
missing files:
gcloud storage cp VCF_FILE_DIRECTORY/ gs://BUCKET/vcf/ \ --recursive --no-clobber
Replace the following:
- VCF_FILE_DIRECTORY: the path to the local directory containing VCF files
- BUCKET: the name of your Cloud Storage bucket
For more information on copying data to Cloud Storage, see Using Cloud Storage with Big Data.