Use agent policies

Agent policies enable automated installation and maintenance of the Google Cloud Observability agents across a fleet of Compute Engine VMs that match user-specified criteria. You can create a policy for a Google Cloud project that governs existing and new VMs associated with that Google Cloud project, ensuring proper installation, uninstallation, and optional auto-upgrade of all Google Cloud Observability agents on those VMs.

You create and manage agent policies by using the gcloud beta compute instances ops-agents policies command group in the Google Cloud CLI or the agent-policy Terraform module. Agent policies use the VM Manager suite of tools in Compute Engine to manage OS policies, which can automate the deployment and maintenance of software configurations like the Google Cloud Observability agents: the Ops Agent, the legacy Monitoring agent, and the legacy Logging agent.

Supported operating systems

You can apply an agent policy to Compute Engine VM instances running the operating systems shown in the following table:

Operating system Ops Agent
(GA & beta policies)
Logging agent
(beta policies only)
Monitoring agent
(beta policies only)
CentOS 8
Rocky Linux 8
RHEL 6
RHEL 7:
rhel-7, rhel-7-6-sap-ha, rhel-7-7-sap-ha, rhel-7-9-sap-ha
RHEL 8:
rhel-8, rhel-8-4-sap-ha, rhel-8-6-sap-ha, rhel-8-8-sap-ha
Debian 9 (Stretch)
Debian 11 (Bullseye)
Deep Learning VM Images based on Debian 11 (Bullseye)
Ubuntu LTS 18.04 (Bionic Beaver):
ubuntu-1804-lts, ubuntu-minimal-1804-lts
Ubuntu LTS 20.04 (Focal Fossa):
ubuntu-2004-lts, ubuntu-minimal-2004-lts
Ubuntu LTS 22.04 (Jammy Jellyfish):
buntu-2204-lts, ubuntu-minimal-2204-lts
SLES 12:
sles-12, sles-12-sp5-sap
SLES 15:
sles-15, sles-15-sp2-sap, sles-15-sp3-sap, sles-15-sp4-sap, sles-15-sp5-sap, sles-15-sp6-sap
OpenSUSE Leap 15:
opensuse-leap (opensuse-leap-15-3-*,
opensuse-leap-15-4-*)
Windows Server:
2016, 2019, 2022, Core 2016, Core 2019, Core 2022
  In beta agent policies, the agent columns map to an agent type specified to the gcloud beta compute instances ops-agents policies create invocation:
  • Ops Agent maps to agent type ops-agent.
  • Logging agent maps to agent type logging.
  • Monitoring agent maps to agent type metrics.
 The Monitoring agent is not supported on rhel-7-9-sap-ha, rhel-8-2-sap-ha, or rhel-8-4-sap-ha.

Create an agent policy

This section describes using the Google Cloud SDK for managing agent policies. For information about using Terraform, see Terraform integration.

To create an agent policy by using the Google Cloud CLI, complete the following steps:

  1. If you haven't done so already, install the Google Cloud CLI.

    The agent policies described in this document use the beta command group.

  2. If you haven't done so already, install the beta component of the gcloud CLI:

    gcloud components install beta
    

    To check if you have the beta component for the installed, run:

    gcloud components list
    

    If you previously installed the beta component, ensure you have the latest version:

    gcloud components update
    
  3. Download and use the following script to enable the APIs and to set the proper permissions for using the Google Cloud CLI: set-permissions.sh.

    For information about the script, refer to The set-permissions.sh script.

  4. Use the gcloud beta compute instances ops-agents policies create command to create a policy. For the syntax of the command, see the gcloud beta compute instances ops-agents policies create documentation.

    For examples showing how to format the command, see the Examples section in the Google Cloud CLI documentation.

    For more information about the other commands in the command group and the available options, see the gcloud beta compute instances ops-agents policies documentation.

Best practices for using agent policies

To control the impact to production systems during rollout, we recommend that you use instance labels and zones to filter the instances that the policy applies to.

Here is an example of a phased rollout plan for Debian 11 VMs in a project called my_project:

Phase 1: Create a policy named ops-agents-policy-safe-rollout to install the legacy Logging agent and Monitoring agent on all VMs with the labels env=test and app=myproduct.

gcloud beta compute instances \
    ops-agents policies create ops-agents-policy-safe-rollout \
    --agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
    --os-types=short-name=debian,version=11 \
    --group-labels=env=test,app=myproduct \
    --project=my_project

For more information about specifying the operating system, see gcloud beta compute instances ops-agents policies create.

Phase 2: Update that policy to target VMs in a single zone that have the labels env=prod and app=myproduct.

gcloud beta compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --group-labels=env=prod,app=myproduct \
    --zones=us-central1-c \

Phase 3: Update that policy to clear the zones filter so it rolls out globally

gcloud beta compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --clear-zones

Policies on VMs that predate OS Config

You might need to manually install and configure the OS Config agent on VMs that predate OS Config. For information about manually installing and verifying the OS Config agent, see the VM Manager verification checklist.

Troubleshoot beta agent policies

This section provides information to help you resolve problems with beta agent policies for the Ops Agent, the legacy Monitoring agent, and the legacy Logging agent.

The ops-agents policy commands fail

When a gcloud beta compute instances ops-agents policies command fails, the response shows a validation error. Correct the errors by fixing the command arguments and flags as suggested by the error message.

In addition to the validation errors, you might see errors that indicate the following conditions:

The following sections describe these conditions in more detail.

Insufficient IAM permission

If a gcloud beta compute instances ops-agents policies command fails with a permission error, then make sure you have run the set-permissions.sh script as described in Create an agent policy to set up the OS Config policy roles:

For more information about the set-permissions.sh script, see The set-permissions.sh script.

OS Config API isn't enabled

A sample error looks like the following:

API [osconfig.googleapis.com] not enabled on project PROJECT_ID.
Would you like to enable and retry (this will take a few minutes)?
(y/N)?

You can enter y to enable the API, or you can run the set-permissions.sh script, described in Create an agent policy, to grant all the necessary permissions. If you enter y at the prompt in the error message, then you still need to run the set-permissions.sh script to set the needed permissions.

To verify that the OS Config API is enabled for the project, run the following commands:

gcloud services list --project PROJECT_ID | grep osconfig.googleapis.com

The expected output follows:

osconfig.googleapis.com    Cloud OS Config API

The policy does not exist

A sample error looks like the following:

NOT_FOUND: Requested entity was not found

This error might meant that the policy was never created, the policy has been deleted, or that the specified policy ID is incorrect. Make sure the POLICY_ID used in a gcloud beta compute instances ops-agents policies describe, update or delete command corresponds to an existing policy. To get a list of agent policies, use the gcloud beta compute instances ops-agents policies list command.

The policy is created, but seems to have no effect

OS Config agents are deployed to each Compute Engine instance to manage the packages for the Logging and Monitoring agents. The policy may seem to have no effect if the underlying OS Config agent isn't installed.

Linux

To verify that the OS Config agent is installed, run the following command:

gcloud compute ssh instance-id \
    --project project-id \
    -- sudo systemctl status google-osconfig-agent

A sample output is:

    google-osconfig-agent.service - Google OSConfig Agent
    Loaded: loaded (/lib/systemd/system/google-osconfig-agent.service; enabled; vendor preset:
    Active: active (running) since Wed 2020-01-15 00:14:22 UTC; 6min ago
    Main PID: 369 (google_osconfig)
     Tasks: 8 (limit: 4374)
    Memory: 102.7M
    CGroup: /system.slice/google-osconfig-agent.service
            └─369 /usr/bin/google_osconfig_agent

Windows

To verify that the OS Config agent is installed, run the following steps:

  1. Connect to your instance using RDP or a similar tool and login to Windows.

  2. Open a PowerShell terminal, then run the following PowerShell command. You don't need administrator privileges.

    Get-Service google_osconfig_agent
    

A sample output is:

    Status   Name               DisplayName
    ------   ----               -----------
    Running  google_osconfig_a… Google OSConfig Agent

If the OS Config agent is not installed, then you might be using an operating system that doesn't support VM Manager. The Compute Engine Operating system details document indicates which VM Manager features are supported for each Compute Engine operating system.

If the operating system supports VM Manager, then you can install the OS Config agent manually.

The OS Config agent is installed, but it does not install the Logging agent

To verify if there are any errors when the OS Config agent applies policies, you can check the OS Config agent's log. This can be done either by using Logs Explorer or using SSH or RDP to check individual Compute Engine instances.

To view OS Config agent logs in Logs Explorer, use the following filter:

resource.type="gce_instance"
logId(OSConfigAgent)

To view OS Config agent logs, do the following:

CentOS, RHEL,
SLES, SUSE

Run the following command:

gcloud compute ssh INSTANCE_ID \
    --project PROJECT_ID \
    -- sudo cat /var/log/messages \
       | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"

Debian, Ubuntu

Run the following command:

gcloud compute ssh INSTANCE_ID \
    --project PROJECT_ID \
    -- sudo cat /var/log/syslog \
       | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"

Windows

  1. Connect to your instance using RDP or a similar tool and login to Windows.

  2. Open the Event Viewer app, then select Windows Logs > Application, and search for logs with Source equal to OSConfigAgent.

If there is an error connecting to the OS Config service, make sure you run the set-permissions.sh script as described in Create an agent policy to set up the OS Config metadata.

To verify that the OS Config metadata is enabled, you can run the following command:

gcloud compute project-info describe \
    --project PROJECT_ID \
    | grep "enable-osconfig\|enable-guest-attributes" -A 1

The expected output follows:

- key: enable-guest-attributes
  value: 'TRUE'
- key: enable-osconfig
  value: 'TRUE'

Observability agents are installed, but not functioning properly

For information about debugging specific agents, see the following documents:

Enable debug-level logs for the OS Config agent

It can be useful to enable debug-level logging in the OS Config agent when reporting an issue.

You can set the osconfig-log-level: debug metadata to enable debug-level logging for the OS Config agent. The collected logs have more information to help with the investigation.

To enable debug-level logging for the entire project, run the following command:

gcloud compute project-info add-metadata \
    --project PROJECT_ID \
    --metadata osconfig-log-level=debug

To enable debug-level logging for one VM, run the following command:

gcloud compute instances add-metadata INSTANCE_ID \
    --project PROJECT_ID \
    --metadata osconfig-log-level=debug

Helper scripts

This section provides additional information about the helper scripts described in this document:

The set-permissions.sh script

After you download the set-permissions.sh script, you can use the script to perform the following actions, based on the arguments you provide:

The following examples show some common invocations for the script. For more information, see the comments in the script itself.

To enable the APIs, grant the necessary roles to the default service account, and enable the OS Config metadata for a project, run the script as follows:

bash set-permissions.sh --project=PROJECT_ID

To additionally grant one of the OS Config roles to a user who does not have the Owner (roles/owner) role on the project, run the script as follows:

bash set-permissions.sh --project=PROJECT_ID \
  --iam-user=USER_EMAIL \
  --iam-permission-role=guestPolicy[Admin|Editor|Viewer]

To additionally grant one of the OS Config roles to a non-default service account, run the script as follows:

bash set-permissions.sh --project=PROJECT_ID \
  --iam-service-account=SERVICE_ACCT_EMAIL \
  --iam-permission-role=guestPolicy[Admin|Editor|Viewer]

The diagnose.sh script

Given a project ID, a Compute Engine instance ID, and the agent policy ID, the diagnose.sh script automatically collects the necessary information to help diagnose issues with the policy:

  • The OS Config agent version
  • The underlying OS Config guest policy
  • The policies that are applicable to this Compute Engine instance
  • The agent package repositories that are pulled on to this Compute Engine instance

To invoke the script, run the following command:

bash diagnose.sh --project-id=PROJECT_ID \ 
  --gce-instance-id=INSTANCE_ID \
  --policy-id=POLICY_ID 

Terraform integration

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands. For information about how Terraform works, see Using Terraform.

Terraform support for agent policies is built on top of the Google Cloud CLI commands. To create an agent policy by using Terraform, follow the Terraform module agent-policy instructions. You can also find example policies in the examples directory.