Custom training job fails to pull a container

Problem

Custom Vertex AI training job failed with the error message below:
Pulling container image failed.

Environment

  • Vertex AI in projects created before June 2021.

Solution

You must add the pkg.dev DNS entry to your project. To accomplish this, apply one of the options described below:

  1. Create a new project. The DNS entry for pkg.dev will be added automatically.
  2. Contact Google Cloud Support or your dedicated Account Team and request to backfill the DNS entry for pkg.dev in your old project's Vertex AI environment.

Cause

In an older project before June 2021 the DNS entry for pkg.dev domain may be missing in the setup.
Note: Rule out any of the following causes because the same error message can be shown when; 
  • The image or required image version does not exist,
  • Networking is blocking the connection for example firewall settings,
  • Trying to pull the container image from private image repository. Vertex AI supports custom training with container images on Artifact Registry, Container Registry, or Docker Hub, see: Create a custom container image for training.