Migrating containers from a third-party registry

If you pull some container images directly from third-party registries to deploy to Google Cloud environments such as Google Kubernetes Engine or Cloud Run, rate limits on image pulls or third-party outages can disrupt your builds and deployments. This page describes how to identify and copy those images to a registry in Google Cloud for consolidated, consistent container image management.

Additionally, you can take advantage of other capabilities, including securing your software supply chain with vulnerability scanning and enforcement of deployment policies with Binary Authorization.

Choosing a registry

Artifact Registry is the recommended service for storing and managing container images and other build artifacts in Google Cloud.

  • If aren't currently using Container Registry, migrate images to Artifact Registry instead. Artifact Registry provides greater flexibility and control, including storing images in a region rather than a multi-region, more granular access control, and support for other artifact formats.
  • If you are currently using Container Registry use the instructions in this document to migrate your images to a Container Registry host.

Migration overview

Migration of your container images includes the following steps:

  1. Set up prerequisites.
  2. Identify images to migrate.
    • Search your Dockerfile files and deployment manifests for references to third-party registries
    • Determine pull frequency of images from third-party registries using Cloud Logging and BigQuery.
  3. Copy identified images to Container Registry.
  4. Verify that permissions to the registry are correctly configured, particularly if Container Registry and your Google Cloud deployment environment are in different projects.
  5. Update manifests for your deployments.
  6. Re-deploy your workloads.
  7. (Optional) Block deployments of images from third-party sources.

Container Registry does not monitor third-party registries for updates to images you copy to Container Registry. If you want to incorporate a newer version of an image into your pipeline, you must push it to Container Registry.

Before you begin

  1. Verify your permissions. You must have the Owner or Editor IAM role in the projects where you are migrating images to Container Registry.
  2. Go to the project selector page

    1. Select the Google Cloud project where you want to use Container Registry
    2. In the Google Cloud console, go to Cloud Shell
    3. Find your project ID and set it in Cloud Shell. Replace YOUR_PROJECT_ID with your project ID.

      gcloud config set project YOUR_PROJECT_ID
      
  3. Export the following environment variables:

      export PROJECT=$(gcloud config get-value project)
    
  4. Enable the BigQuery, Container Registry, and Cloud Monitoring APIs with the following command:

    gcloud services enable \
    containerregistry.googleapis.com \
    stackdriver.googleapis.com \
    logging.googleapis.com \
    monitoring.googleapis.com
    
  5. Verify that Go version 1.13 or newer is installed.

    • Check the version of an existing Go installation with the command:

      go version
      
    • If you need to install or update Go, see the Go installation documentation.

Costs

This guide uses the following billable components of Google Cloud:

Identify images to migrate

Search the files you use to build and deploy your container images for references to third-party registries, then check how often you pull the images.

Identify references in Dockerfiles

Perform this step in a location where your Dockerfiles are stored. This might be where your code is locally checked out or in Cloud Shell if the files are available in a VM.

In the directory with your Dockerfiles, run the command:

grep -inr -H --include Dockerfile\* "FROM" . | grep -i -v -E 'docker.pkg.dev|gcr.io'

The output looks like the following example:

./code/build/baseimage/Dockerfile:1:FROM debian:stretch
./code/build/ubuntubase/Dockerfile:1:FROM ubuntu:latest
./code/build/pythonbase/Dockerfile:1:FROM python:3.5-buster

This command searches all the Dockerfiles in your directory and identifies the "FROM" line. Adjust the command as needed to match the way you store your Dockerfiles.

Identify references in manifests

Perform this step in a location where your GKE or Cloud Run manifests are stored. This might be where your code is locally checked out or in Cloud Shell if the files are available in a VM.

  1. In the directory with your GKE or Cloud Run manifests run the command:

    grep -inr -H --include \*.yaml "image:" . | grep -i -v -E 'docker.pkg.dev|gcr.io'
    

    Sample output:

    ./code/deploy/k8s/ubuntu16-04.yaml:63: image: busybox:1.31.1-uclibc
    ./code/deploy/k8s/master.yaml:26:      image: kubernetes/redis:v1
    

    This command looks at all YAML files in your directory and identifies the image: line, adjust as needed to work with how manifests are stored

  2. To list images currently running on a cluster run the command:

      kubectl get all --all-namespaces -o yaml | grep image: | grep -i -v -E 'docker.pkg.dev|gcr.io'
    

    This command returns all objects running in the currently selected Kubernetes cluster and gets their image names.

    Sample output:

    - image: nginx
      image: nginx:latest
        - image: nginx
        - image: nginx
    

Run this command for all GKE clusters across all Google Cloud projects for total coverage.

Identify pull frequency from a third-party registry

In projects that pull from third-party registries, use information about image pull frequency to determine if you usage is near or over any rate limits that the third-party registry enforces.

Collect log data

Create a log sink to export data to BigQuery. A log sink includes a destination and a query that selects the log entries to export. You can create a sink by querying individual projects, or you can use a script to collect data across projects.

To create a sink for a single project:

These instructions are for Logging Preview interface.

  1. Go to the Logs Explorer

  2. Choose a Google Cloud project.

  3. On the Query builder tab, enter the following query:

      resource.type="k8s_pod"
      jsonPayload.reason="Pulling"
    
  4. Change history filter from Last 1 hour to Last 7 Days.

    image

  5. Click Run Query.

  6. After verifying that results show up correctly, click Actions > Create Sink.

  7. In the list of sinks, select BigQuery dataset, then click Next.

  8. In the Edit Sink panel perform the following steps:

    • In the Sink Name field, enter image_pull_logs.
    • In the Sink Destination field, create a new dataset or choose a destination dataset in another project.
  9. Click Create Sink.

To create a sink for multiple projects:

  1. Open Cloud Shell.

  2. Run the following commands in Cloud Shell:

    PROJECTS="PROJECT-LIST"
    DESTINATION_PROJECT="DATASET-PROJECT"
    DATASET="DATASET-NAME"
    
    for source_project in $PROJECTS
    do
      gcloud logging --project="${source_project}" sinks create image_pull_logs bigquery.googleapis.com/projects/${DESTINATION_PROJECT}/datasets/${DATASET} --log-filter='resource.type="k8s_pod" jsonPayload.reason="Pulling"'
    done
    

    where

    • PROJECT-LIST is a list of Google Cloud project IDs, separated with spaces. For example project1 project2 project3.
    • DATASET-PROJECT is the project where you want to store your dataset.
    • DATASET-NAME is the name for the dataset, for example image_pull_logs.

After you create a sink, it takes time for data to flow to BigQuery tables, depending on how frequently images are pulled.

Query for pull frequency

Once you have a representative sample of image pulls that your builds make, run a query for pull frequency.

  1. Go to the BigQuery console.

  2. Run the following query:

    SELECT
      REGEXP_EXTRACT(jsonPayload.message, r'"(.*?)"') AS imageName,
      COUNT(*) AS numberOfPulls
    FROM
          `DATASET-PROJECT.DATASET-NAME.events_*`
    GROUP BY
          imageName
    ORDER BY
          numberOfPulls DESC
    

    where

    • DATASET-PROJECT is the project that contains your dataset.
    • DATASET-NAME is the name of the dataset.

The following example shows output from the query. In the imageName column, you can review the pull frequency for images that are not stored in Container Registry or Artifact Registry.

image

Copy images to Container Registry

After you have identified images from third-party registries, you are ready to copy them to Container Registry. The gcrane tool helps you with the copying process.

  1. Create a text file images.txt in Cloud Shell with the names of the images you identified. For example:

    ubuntu:18.04
    debian:buster
    hello-world:latest
    redis:buster
    jupyter/tensorflow-notebook
    
  2. Download gcrane.

      GO111MODULE=on go get github.com/google/go-containerregistry/cmd/gcrane
    
  3. Create a script named copy_images.sh to copy your list of files.

    #!/bin/bash
    
    images=$(cat images.txt)
    
    if [ -z "${GCR_PROJECT}" ]
    then
        echo ERROR: GCR_PROJECT must be set before running this
        exit 1
    fi
    
    for img in ${images}
    do
        gcrane cp ${img} gcr.io/${GCR_PROJECT}/${img}
    done
    

    Make the script executable:

      chmod +x copy_images.sh
    
  4. Run the script to copy the files:

    GCR_PROJECT=${PROJECT}
    ./copy_images.sh
    

Verify permissions

By default Google Cloud CI/CD services have access to Container Registry in the same Google Cloud project.

  • Cloud Build can push and pull images
  • Runtime environments such as GKE, Cloud Run, the App Engine flexible environment, and Compute Engine can pull images.

If you need to push or pull images across projects, or if you are using third-party tools in your pipeline that need to access Container Registry, make sure that permissions are configured correctly before you update and re-deploy your workloads.

For more information, see the access control documentation.

Update manifests to reference Container Registry

Update your Dockerfiles and your manifests to refer to Container Registry instead of the third-party registry.

The following example shows manifest referencing a third-party registry:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This updated version of the manifest points to the image in Container Registry:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: gcr.io/<GCR_PROJECT>/nginx:1.14.2
        ports:
        - containerPort: 80

For a large number of manifests, use sed or another tool that can handle updates across many text files.

Re-deploy workloads

Re-deploy workloads with your updated manifests.

Keep track of new image pulls by running the following query in the BigQuery console:

SELECT`

FORMAT_TIMESTAMP("%D %R", timestamp) as timeOfImagePull,
REGEXP_EXTRACT(jsonPayload.message, r'"(.*?)"') AS imageName,
COUNT(*) AS numberOfPulls
FROM
  `image_pull_logs.events_*`
GROUP BY
  timeOfImagePull,
  imageName
ORDER BY
  timeOfImagePull DESC,
  numberOfPulls DESC

All new image pulls should be from Container Registry and contain the string gcr.io.

(Optional) Block image pulls from third-party registries

For GKE clusters that use Binary Authorization, the policy you define automatically blocks pulls from untrusted sources. Ensure that your migrated images are not blocked by the policy by adding them to the list of exemptions. These instructions describe how to specify an exemption for all images stored in Container Registry within your project.

When you initially update the policy, consider enabling dry run mode. Instead of blocking images, Binary Authorization creates audit log entries so that you can identify outstanding images from third-party registries that you need to migrate to Container Registry.

For more information about configuring deployment policies, see the Binary Authorization documentation.

  1. Go to Binary Authorization page
  2. Click Edit Policy.
  3. Under Project default rule, enable Dry run mode.
  4. Under Images exempt from deployment rules, leave Trust all Google-provided system images selected.
  5. Expand Image paths.
  6. Add the path to your images as an exemption to the default project rule:
    1. At the bottom of the image list, click Add images.
    2. Enter the image path for your Google Cloud project. For example, gcr.io/my-project/* exempts all images in the project my-project.
  7. Repeat the previous step for other projects containing images that you want to deploy.

Review dry run events in Logging for your deployments. Migrate any remaining images that you regularly pull from third-party registries. When all your images are migrated, you can edit the policy to disable dry run mode and block images from untrusted sources.