Variant Transforms is an open source tool used with Cloud Life Sciences. It is based on Apache Beam and uses Dataflow.
You can use Variant Transforms to transform and load the following in a scalable manner:
- Hundreds of thousands of files
- Millions of samples
- Billions of records
You can use the Variant Transforms preprocessor to validate VCF files and identify inconsistencies.
The typical workflow when using the tool consists of these steps:
- Storing raw VCF files in Cloud Storage.
- Using the Variant Transforms tool to load the VCF files from Cloud Storage into BigQuery.
You can then use BigQuery to analyze the variants.
You should familiarize yourself with the BigQuery variants schema for information on how the tool loads VCF files into BigQuery tables.