Build multi-architecture container images for Dataflow

If you use a custom container in Dataflow, the container must match the architecture of the worker VMs. This document describes how to create multi-architecture containers that are compatible with both x86 and Arm VMs.

You can use the Docker CLI or Cloud Build to build the container image.

Use Docker to build the image

  1. Create a Dockerfile. Use the FROM instruction to specify a multi-architecture base image.

    FROM apache/beam_python3.10_sdk:2.50.0
    
    # Make your customizations here, for example:
    ENV FOO=/bar
    COPY path/to/myfile ./
    
  2. Install the Buildx tool. To check whether the tool is installed, run the following command:

    docker buildx version
    
  3. Run the following command to create a builder instance that uses the docker-container driver. This driver is required to build multi-architecture images.

    docker buildx create --driver=docker-container --use
    

    The --use flag sets the new builder instance as the current builder.

  4. Run the following command to configure Docker to authenticate requests to Artifact Registry.

    gcloud auth configure-docker REGION-docker.pkg.dev
    

    Replace REGION with the region of the Artifact Registry repository.

  5. Run the following command to build and push the container image to Artifact Registry:

    docker buildx build \
      --platform=linux/amd64,linux/arm64 \
      -t REGISTRY/IMAGE:TAG  \
      --push .
    

    Replace the following:

    • REGISTRY: the Docker repository
    • IMAGE: the image name
    • TAG: the image tag

Use Cloud Build to build the image

  1. Create a Dockerfile. Use the FROM instruction to specify a multi-architecture base image.

    FROM apache/beam_python3.10_sdk:2.50.0
    
    # Make your customizations here, for example:
    ENV FOO=/bar
    COPY path/to/myfile ./
    
  2. In the same directory that contains the Dockerfile, create a file named docker_buildx.yaml. Paste in the following text:

    steps:
    - name: 'docker'
      args: ['buildx', 'create', '--driver', 'docker-container', '--name', 'container', '--use']
    - name: 'docker'
      args: ['buildx', 'build', '--platform', 'linux/amd64,linux/arm64', '-t', 'REGISTRY/IMAGE:TAG', '--push', '.']
    

    Replace the following:

    • REGISTRY: the Docker repository
    • IMAGE: the image name
    • TAG: the image tag
  3. To build and push the image, run the gcloud builds submit command:

    gcloud builds submit --region=REGION --config docker_buildx.yaml
    

    Replace REGION with the region of the Cloud Build service to use.

For more information, see Build and push a Docker image with Cloud Build.

Verify the container image

  1. Open the Repositories page in the Google Cloud console.

    Open the Repositories page

  2. Click the repository with the container image.

  3. Click the image to see its versions.

  4. Click a version.

  5. Click Manifest.

  6. In the manifest file, the platform section should have entries for arm64 and amd64. For example:

      {
        "schemaVersion": 2,
        "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
        "manifests": [
            {
              "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
              "digest": "sha256:441d5438885049e2b388523a8cb5b77ea829c3c3f53326fb221fe185abd67f07",
              "size": 3074,
              "platform": {
                  "architecture": "amd64",
                  "os": "linux"
              }
            },
            {
              "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
              "digest": "sha256:d3b98b0f8f3f555f5453c79b240bd2b862d4f52d853fe81bae55f01a663de29c",
              "size": 3073,
              "platform": {
                  "architecture": "arm64",
                  "os": "linux"
              }
            }
        ]
      }
    

What's next