Dataflow pipeline fails with "Bucket is requester pays bucket but no user project provided"

Problem

When a Cloud Dataflow pipeline tries to read query results from BigQuery, it fails with a permission error:

User does not have bigquery.datasets.create permission in project ...

Environment

  • Cloud Dataflow
  • BigQuery as Input

Solution

  1. Provide the necessary BigQuery permissions to the service account on a project level.
  2. For Python and Java: If you do not wish to provide the service account with broad BigQuery permissions, it is possible to force Dataflow to use a particular temporary dataset to store the query results. In Python this is done by specifying the temp_dataset parameter for ReadAllFromBigQuery or ReadFromBigQuery. In Java this is achieved through the use of withQueryTempDataset when calling the fromQuery method.
    A separate dataset can then be created for the temporary data and only assign the service account the particular dataset level permissions to create tables and modify data inside this dataset. 
    A possible implementation in Python may look like:
    from apache_beam.io.gcp.internal.clients import bigquery
    ...
    beam.io.ReadFromBigQuery(query=query, use_standard_sql=True, temp_dataset=bigquery.DatasetReference( projectId="PROJECT_ID", datasetId="DATASET")

Cause

When Dataflow reads data from BigQuery, it needs to create a temporary dataset and tables to store the query result data before Dataflow can consume this data as input.