This page shows you how to resolve issues with Batch.
If you are trying to troubleshoot a job that you do not have an error message for, check if the status events for the job contain any error messages before reviewing this document:
- Describe the job.
- Review the status events field:
- If you are using the Google Cloud console, click the Events tab, and then review the Events list section.
- If you are using the gcloud CLI or
Batch API, review the
statusEvents
field.
For more information to troubleshoot a job, also see Batch quotas and limits.
Job creation errors
If you can't create a job, it might be due to one of the errors in this section.
Insufficient quota
Issue
One of the following issues occurs when you try to create a job:
RESOURCE_NAME creation failed: Quota QUOTA_NAME exceeded. Limit: QUOTA_LIMIT in region REGION
RESOURCE_NAME creation failed: Quota QUOTA_NAME exceeded. Limit: QUOTA_LIMIT in zone ZONE
Solution
This issue indicates that a resource request exceeds your quota.
To resolve the issue, wait for more quota to be released or request a higher quota limit. For more information, see Batch quotas and limits and Requesting a higher quota.
Insufficient permissions to act as the service account
Issue
The following issue occurs when you try to create a job:
If the job does not use an instance template, the issue appears as the following:
caller does not have access to act as the specified service account: SERVICE_ACCOUNT_NAME
If the job uses an instance template, the issue appears as the following:
Error: code - CODE_SERVICE_ACCOUNT_MISMATCH, description - The service account specified in the instance template INSTANCE_TEMPLATE_SERVICE_ACCOUNT doesn't match the service account specified in the job JOB_SERVICE_ACCOUNT for JOB_UID, project PROJECT_NUMBER
Solution
This issue usually occurs because the user creating the job does not have
sufficient permissions to act as the service account used by the job,
which is controlled by the
iam.serviceAccounts.actAs
permission.
To resolve the issue, do the following:
- If the job uses an instance template, verify that the service account specified in the instance template matches the service account specified in the job's definition.
- Make sure that the user who is creating the job has been granted
Service Account User role (
roles/iam.serviceAccountUser
) on the service account specified for the job. For more information, see Manage access. - Recreate the job.
Repeated networks
Issue
The following issue occurs when you try to create a job:
Networks must be distinct for NICs in the same InstanceTemplate
Solution
This issue occurs because you specified the network for a job more than once. To resolve the issue, recreate the job and specify the network by using one of the following options:
- VM instance template: If you want to use a VM instance template while creating this job, you must specify the network in the VM instance template.
network
andsubnetwork
fields: These fields can be used in the request body when you create a job using the Batch API or in the JSON configuration file when you create a job using the gcloud CLI.--network
and--subnetwork
flags: These flags can be used with thegcloud batch jobs submit
command when you create a job using the gcloud CLI.
For more information, see Specify the network for a job.
Job failure errors
If you have issues with a job that is not running correctly or failed for unclear reasons, it might be due to one of the errors in this section.
No logs in Cloud Logging
Issue
You need to debug a job, but no logs appear for the job in Cloud Logging.
Solution
This issue often occurs for the following reasons:
- The job was not configured to produce logs. To produce logs in Cloud Logging, a job needs to have Cloud Logging enabled. The job's runnables should also be configured to write any information that you want to appear in logs to the standard output (stdout) and standard error (stderr) streams. For more information, see Analyze a job by using logs.
- Tasks did not run. Logs cannot be produced until tasks have been assigned resources and start running.
To resolve the issue,
describe the job using the gcloud CLI or Batch API.
Specifically, the job's
status
field
provides information that you can use to debug the job. Additionally,
describing the job can also help you understand why the job did not produce logs.
No agent reporting
Issue
The following issue appears in the
statusEvents
field
for a job that is not running properly or failed before VMs were created:
No VM has agent reporting correctly within time window NUMBER_OF_SECONDS seconds, VM state for instance VM_NAME is TIMESTAMP,agent,start
The issue indicates that none of a job's VMs are reporting to the Batch service agent.
Solution
This issue often occurs for the following reasons:
- The job's VMs do not have sufficient permissions.
Specifically, this issue suggests that the job's VMs do not have the
permissions to report their state to the Batch
service agent. A job's VMs can get these permissions by granting the
Batch Agent Reporter role (
roles/batch.agentReporter
) to the job's service account. - The job's VMs have network issues. The job's VMs cannot report to the Batch service agent due to a network issue.
To resolve the issue, do the following:
Verify that the job's VMs have the permissions required to report their state to the Batch service agent.
- To identify the job's service account, describe the job using the gcloud CLI or Batch API. If no service account is listed, the job uses the Compute Engine default service account by default.
Confirm that the job's service account has permissions for the Batch Agent Reporter role (
roles/batch.agentReporter
). For more information, see Manage access and Restricting service account usage.For example, to grant the Compute Engine default service account the required permissions, use the following command:
gcloud projects add-iam-policy-binding / --role roles/batch.agentReporter / --member serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com
Replace PROJECT_NUMBER with your project number.
If the job's VMs already had sufficient permissions, verify that the VMs have proper network access. For more information, see Batch networking overview and Troubleshoot common networking issues.
Recreate the job.
Constraint violated for VM external IP addresses
Issue
The following issue appears in the
statusEvents
field
for a failed job:
Instance VM_NAME creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project PROJECT_NUMBER. Add instance VM_NAME to the constraint to use external IP with it.
Solution
This issue occurs because your project or organization has set the
compute.vmExternalIpAccess
organizational policy constraint
so that only allowlisted VMs can use external IP addresses.
To resolve the issue, recreate the job and do one of the following:
- Use a project that is exempt from the constraint.
- Create a job that blocks external access for all VMs.
Job failed while using an instance template
Issue
The following issue appears in the
statusEvents
field
for a failed job that uses an instance template:
INVALID_FIELD_VALUE,BACKEND_ERROR
Solution
This issue occurs due to unclear problems with the job's instance template.
To debug the issue further, do the following:
- Create a MIG using the instance template and observe if errors occur with more details.
Optional: To try to find more information, see the long running operation that is creating the MIG in the Google Cloud console.