This page explains how to use Arm VMs as workers for batch and streaming Dataflow jobs.
You can use the Tau T2A machine series and C4A machine series (Preview) of Arm processors to run Dataflow jobs. Because Arm architecture is optimized for power efficiency, using these VMs yields better price for performance for some workloads. For more information about Arm VMs, see Arm VMs on Compute.
Requirements
- The following Apache Beam SDKs support Arm VMs:
- Apache Beam Java SDK versions 2.50.0 or later
- Apache Beam Python SDK versions 2.50.0 or later
- Apache Beam Go SDK versions 2.50.0 or later
- Select a region where Tau T2A or C4A machines are available. For more information, see Available regions and zones.
- Use Runner v2 to run the job.
- Streaming jobs must use Streaming Engine.
Limitations
- All Tau T2A limitations and C4A limitations apply.
- GPUs are not supported.
- Cloud Profiler is not supported.
- Dataflow Prime is not supported.
- Receiving worker VM metrics from Cloud Monitoring is not supported.
- Container image pre-building is not supported.
Run a job using Arm VMs
To use Arm VMs, set the following pipeline option.
Java
Set the workerMachineType
pipeline option and specify an
ARM machine type.
For more information about setting pipeline options, see Set Dataflow pipeline options.
Python
Set the machine_type
pipeline option and specify an
ARM machine type.
For more information about setting pipeline options, see Set Dataflow pipeline options.
Go
Set the worker_machine_type
pipeline option and specify an
ARM machine type.
For more information about setting pipeline options, see Set Dataflow pipeline options.
Use multi-architecture container images
If you use a custom container in Dataflow, the container must match the architecture of the worker VMs. If you plan to use a custom container on ARM VMs, we recommend building a multi-architecture image. For more information, see Build a multi-architecture container image.
Pricing
You are billed for Dataflow compute resources. Dataflow pricing is independent of the machine type family. For more information, see Dataflow pricing.