Troubleshooting

Stay organized with collections Save and categorize content based on your preferences.

This page shows you how to resolve issues with Batch.

If you are trying to troubleshoot a job that you do not have an error message for, you should describe the job using the gcloud CLI or Batch API and check if the statusEvents field contains any error messages before reviewing this document.

For more information to troubleshoot a job, also see Batch quotas and limits.

Job creation errors

If you can't create a job, it might be due to one of the errors in this section.

Insufficient quota

Issue

One of the following issues occurs when you try to create a job:

RESOURCE_NAME creation failed:
Quota QUOTA_NAME exceeded. Limit: QUOTA_LIMIT in region REGION
RESOURCE_NAME creation failed:
Quota QUOTA_NAME exceeded. Limit: QUOTA_LIMIT in zone ZONE

Solution

This issue indicates that a resource request exceeds your quota.

To resolve the issue, wait for more quota to be released or request a higher quota limit. For more information, see Batch quotas and limits and Requesting a higher quota.

Insufficient permissions to act as the service account

Issue

The following issue occurs when you try to create a job:

  • If the job does not use an instance template, the issue appears as the following:

    caller does not have access to act as the specified service account: SERVICE_ACCOUNT_NAME
    
  • If the job uses an instance template, the issue appears as the following:

    Error: code - CODE_SERVICE_ACCOUNT_MISMATCH, description - The service account specified in the instance template INSTANCE_TEMPLATE_SERVICE_ACCOUNT doesn't match the service account specified in the job JOB_SERVICE_ACCOUNT for JOB_UID, project PROJECT_NUMBER
    

Solution

This issue usually occurs because the user creating the job does not have sufficient permissions to act as the service account used by the job, which is controlled by the iam.serviceAccounts.actAs permission.

To resolve the issue, do the following:

  1. If the job uses an instance template, verify that the service account specified in the instance template matches the service account specified in the job's definition.
  2. Make sure that the user who is creating the job has been granted Service Account User role (roles/iam.serviceAccountUser) on the service account specified for the job. For more information, see Manage access.
  3. Recreate the job.

Job failure errors

If you have issues with a job that is not running correctly or failed for unclear reasons, it might be due to one of the errors in this section.

No logs in Cloud Logging

Issue

You need to debug a job, but no logs appear for the job in Cloud Logging.

Solution

This issue often occurs for the following reasons:

  • The job was not configured to produce logs. To produce logs in Cloud Logging, a job needs to have Cloud Logging enabled. The job's runnables should also be configured to write any information that you want to appear in logs to the standard output (stdout) and standard error (stderr) streams. For more information, see Analyze a job by using logs.
  • Tasks did not run. Logs cannot be produced until tasks have been assigned resources and start running.

To resolve the issue, describe the job using the gcloud CLI or Batch API. Specifically, the job's status field provides information that you can use to debug the job. Additionally, describing the job can also help you understand why the job did not produce logs.

No agent reporting

Issue

The following issue appears in the statusEvents field for a job that is not running properly or failed before VMs were created:

No VM has agent reporting correctly within time window NUMBER_OF_SECONDS seconds, VM state for instance VM_NAME is TIMESTAMP,agent,start

The issue indicates that none of a job's VMs are reporting to the Batch service agent.

Solution

This issue often occurs for the following reasons:

  • The job's VMs do not have sufficient permissions. Specifically, this issue suggests that the job's VMs do not have the permissions to report their state to the Batch service agent. A job's VMs can get these permissions by granting the Batch Agent Reporter role (roles/batch.agentReporter) to the job's service account.
  • The job's VMs have network issues. The job's VMs cannot report to the Batch service agent due to a network issue.

To resolve the issue, do the following:

  1. Verify that the job's VMs have the permissions required to report their state to the Batch service agent.

    1. To identify the job's service account, describe the job using the gcloud CLI or Batch API. If no service account is listed, the job uses the Compute Engine default service account by default.
    2. Confirm that the job's service account has permissions for the Batch Agent Reporter role (roles/batch.agentReporter). For more information, see Manage access and Restricting service account usage.

      For example, to grant the Compute Engine default service account the required permissions, use the following command:

      gcloud projects add-iam-policy-binding /
      --role roles/batch.agentReporter /
      --member serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com
      

      Replace PROJECT_NUMBER with your project number.

  2. If the job's VMs already had sufficient permissions, verify that the VMs have proper network access. For more information, see Troubleshoot common networking issues.

  3. Recreate the job.

Constraint violated for VM external IP addresses

Issue

The following issue appears in the statusEvents field for a failed job:

Instance VM_NAME creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project PROJECT_NUMBER.
Add instance VM_NAME to the constraint to use external IP with it.

Solution

This issue occurs because your project or organization has set the compute.vmExternalIpAccess organizational policy constraint to only allow allowlisted VMs to use external IP addresses.

To resolve the issue, recreate the job and do one of the following:

Job failed while using an instance template

Issue

The following issue appears in the statusEvents field for a failed job that uses an instance template:

INVALID_FIELD_VALUE,BACKEND_ERROR

Solution

This issue occurs due to unclear problems with the job's instance template.

To debug the issue further, do the following:

  1. Create a MIG using the instance template and observe if errors occur with more details.
  2. Optional: To try to find more information, see the long running operation that is creating the MIG in the Google Cloud console.

    Go to Compute Engine Operations

What's next