This page describes known issues that you might run into while using Batch.
You might experience latency when listing jobs
If your project contains more than 10,000, you might experience latency when you list jobs using the Batch Job list page in the Google Cloud console, gcloud CLI, or Batch API. This issue doesn't affect viewing a specific job.
To workaround this issue, either reduce the number of jobs in your project or view and query job information that you have stored in BigQuery. To store your job information in BigQuery, use one or more of the following options:
To automatically stream status information for a job to BigQuery, enable Pub/Sub notifications during job creation. For more information, see Monitor jobs using notifications.
To export all the information for a finished job to BigQuery, run the
export-to-bigquery-delete-batch-jobssample script. For more information, see Delete and export jobs.
Jobs might fail when specifying Compute Engine (or custom) VM OS images with outdated kernels
A job might fail if it specifies a Compute Engine VM OS image that does not have the latest kernel version. This issue also impacts any custom images based on Compute Engine VM OS images. The Compute Engine public images that cause this issue are not easily identified and subject to change at any time.
This issue is not indicated by a specific error message. Instead, consider this issue if you have a job that fails unexpectedly and specifies a Compute Engine VM OS image or similar custom image.
To prevent or resolve this issue, you can do the following:
- Whenever possible, use Batch images or custom images based off Batch images, which aren't affected by this issue.
- If you can't use a Batch image, try the latest version of your preferred Compute Engine image. Generally, newer versions of Compute Engine images are more likely to have the latest kernel version than older versions.
- If the latest version of a specific image doesn't work, you might need to try a different OS or create a custom image. For example, if the latest version of Debian 11 doesn't work, you can try to create a custom image from a Compute Engine VM that runs Debian 11 and that you've updated to use the latest kernel version.
This issue is caused by an outdated kernel version in the VM OS image that causes the VM to reboot. When a job specifies any VM OS image that is not from Batch or based on a Batch image, Batch installs required packages on the job's VMs after they start. The required packages can vary for different jobs and change over time, and they might require your VM OS image to have the latest kernel version. This issue appears when updating the kernel version requires the VM to reboot, which causes the package installation and the job to fail.
For more information about VM OS images, see Overview of the OS environment for a job's VMs.
Jobs using GPUs and VM OS images with outdated kernels might fail only when automatically installing drivers
This issue is closely related to Jobs might fail when specifying Compute Engine (or custom) VM OS images with outdated kernels. Specifically, jobs that both specify a Compute Engine (or custom) VM OS image without the latest kernel and use GPUs might fail only if you try to install GPU drivers automatically. For these jobs, you might also resolve the failures just by installing GPU drivers manually.
For more information about GPUs, see Create and run a job that uses GPUs.