You can customize the runtime environment of user code in Dataflow pipelines by supplying a custom container image. Custom containers are supported for pipelines that use Dataflow Runner v2.
When Dataflow starts up worker VMs, it uses Docker container images to launch containerized SDK processes on the workers. By default, a pipeline uses a prebuilt Apache Beam image. However, you can provide a custom container image for your Dataflow job. When you specify a custom container image, Dataflow launches workers that pull the specified image.
You might use a custom container for the following reasons:
- Preinstall pipeline dependencies to reduce worker start time.
- Preinstall pipeline dependencies that are not available in public repositories.
- Prestage large files to reduce worker start time.
- Launch third-party software in the background.
- Customize the execution environment.
- Build custom container images
- Build multi-architecture container images
- Run a Dataflow job in a custom container
- Troubleshoot custom containers