Simple SELECT to BigQuery table from Dataflow causes an error

Problem

Simple SELECT from a BigQuery table in Dataflow causes an error:
Access denied : need bigquery.dataset.create permission.

Error is observed in Cloud logging in Dataflow logs:

Message: Access Denied: Project [some_project]: User does not have bigquery.datasets.create permission in project [some_project].

Environment

  • Any interactive query run from Dataflow job on a BigQuery table or dataset, to which the Service Account running the job has only read permissions.
  • A Dataflow job with code similar to:
    beam.io.BigQuerySource(
    
        query='SELECT * FROM `some_project.some_dataset.some_table` LIMIT 10 ',
    
        use_standard_sql=True)
    

Solution

  1. Pre-create a temporary dataset, using another service account that already has bigquery.dataset.create, to avoid adding bigquery.dataset.create permission to clients who only read data.
  2. Pass the dataset's path in the query parameters.
  3. To pass the pre-created temporary dataset's path to the pipeline use the option named "tempLocation".

Cause

The error is thrown when the Service Account running the job does not have permission to create a temporary dataset for the query. The implementation of beam.io.BigQuerySource requires use of a temporary dataset each time a query is run.

Useful links