Collect additional resource metrics using the Ops Agent

This document describes how to create and run a Batch job that automatically installs the Ops Agent. Install the Ops Agent to provide additional metrics in Cloud Monitoring about the performance of a job's resources. To learn more about using resource performance metrics for a job, see Monitor and optimize job resources by viewing metrics.

Before you begin

  1. If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
  2. If your project hasn't already, enable the Cloud Monitoring and Cloud Logging APIs:

    Enable the APIs

  3. To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:

    For more information about granting roles, see Manage access to projects, folders, and organizations.

    You might also be able to get the required permissions through custom roles or other predefined roles.

  4. Unless you are using the default configuration for the job's service account, ensure that it has the necessary permissions.

    To ensure that the job's service account has the necessary permissions to write Ops Agent metrics to Monitoring, ask your administrator to grant the job's service account the following IAM roles:

  5. Ensure that your planned job configuration meets the Ops Agent requirements.

Ops Agent requirements

To create and run a job that uses the Ops Agent, your job must comply with all the following requirements:

For more information about the features and requirements of the Ops Agent, see Ops Agent overview in the Google Cloud Observability documentation.

Create a job that automatically installs the Ops Agent

Use the Google Cloud CLI or REST API to create a job that includes the installOpsAgent field set to true in the allocationPolicy.instances field in the main body in the JSON file:

"allocationPolicy": {
  "instances": [
    {
      "installOpsAgent": true
    }
  ]
}

For example, a job that automatically installs the Ops Agent can have a JSON configuration file that is similar to the following:

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello World! This is task $BATCH_TASK_INDEX."
            }
          }
        ]
      },
      "taskCount": 3,
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "installOpsAgent": true
      }
    ]
  },
  "logsPolicy": {
      "destination": "CLOUD_LOGGING"
  }
}

After the job's VMs start running, you can see the Ops Agent metrics the same as any other resource metric. For more information, see Monitor and optimize job resources by viewing metrics.

What's next