Problem
Simple SELECT from a BigQuery table in Dataflow causes an error:
Access denied : need bigquery.dataset.create permission.
Error is observed in Cloud logging in Dataflow logs:
Message: Access Denied: Project [some_project]: User does not have bigquery.datasets.create permission in project [some_project].
Environment
- Any interactive query run from Dataflow job on a BigQuery table or dataset, to which the Service Account running the job has only read permissions.
- A Dataflow job with code similar to:
beam.io.BigQuerySource( query='SELECT * FROM `some_project.some_dataset.some_table` LIMIT 10 ', use_standard_sql=True)
Solution
- Pre-create a temporary dataset, using another service account that already has bigquery.dataset.create, to avoid adding bigquery.dataset.create permission to clients who only read data.
- Pass the dataset's path in the query parameters.
- To pass the pre-created temporary dataset's path to the pipeline use the option named "tempLocation".
Cause
The error is thrown when the Service Account running the job does not have permission to create a temporary dataset for the query. The implementation of beam.io.BigQuerySource requires use of a temporary dataset each time a query is run.
Useful links