BigQueryIO fails to load large messages into a nested structure with Beam SDK

Problem

BigQueryIO fails to load large messages into a nested structure.

Environment

  • Dataflow
  • BigQuery
  • Beam SDK 2.35 and 2.36.

Solution

  1. Set your Pipeline option to a Large amount between 1MB - 5MB or so in BigQueryOptions.
  2. gRPC should support up to 10MB request sizes. But not advisable to peak the size at 10MB.
     

Workaround

  1. Upgrade to Beam 2.37.0.

Cause

The data insert was considered failed, and output to the failed inserts PCollection, which is part of the return value given by BigQueryIO and a bug in 2.36.0 and earlier. This issue was fixed in Beam 2.37.0.