Use Arm VMs on Dataflow

This page explains how to use Arm VMs as workers for batch and streaming Dataflow jobs.

You can use the Tau T2A machine series and C4A machine series (Preview) of Arm processors to run Dataflow jobs. Because Arm architecture is optimized for power efficiency, using these VMs yields better price for performance for some workloads. For more information about Arm VMs, see Arm VMs on Compute.

Requirements

  • The following Apache Beam SDKs support Arm VMs:
    • Apache Beam Java SDK versions 2.50.0 or later
    • Apache Beam Python SDK versions 2.50.0 or later
    • Apache Beam Go SDK versions 2.50.0 or later
  • Select a region where Tau T2A or C4A machines are available. For more information, see Available regions and zones.
  • Use Runner v2 to run the job.
  • Streaming jobs must use Streaming Engine.

Limitations

Run a job using Arm VMs

To use Arm VMs, set the following pipeline option.

Java

Set the workerMachineType pipeline option and specify an ARM machine type.

For more information about setting pipeline options, see Set Dataflow pipeline options.

Python

Set the machine_type pipeline option and specify an ARM machine type.

For more information about setting pipeline options, see Set Dataflow pipeline options.

Go

Set the worker_machine_type pipeline option and specify an ARM machine type.

For more information about setting pipeline options, see Set Dataflow pipeline options.

Use multi-architecture container images

If you use a custom container in Dataflow, the container must match the architecture of the worker VMs. If you plan to use a custom container on ARM VMs, we recommend building a multi-architecture image. For more information, see Build a multi-architecture container image.

Pricing

You are billed for Dataflow compute resources. Dataflow pricing is independent of the machine type family. For more information, see Dataflow pricing.

What's next