Manage agent policies

Agent policies enable automated installation and maintenance of the Google Cloud Observability agents across a fleet of Compute Engine VMs that match user-specified criteria. With one command, you can create a policy for a Google Cloud project that governs existing and new VMs associated with that Google Cloud project, ensuring proper installation and optional auto-upgrade of all Google Cloud Observability agents on those VMs.

You create and manage agent policies by using the gcloud beta compute instances ops-agents policies command group in the Google Cloud CLI. The commands in this group use the VM Manager suite of tools in Compute Engine to manage OS policies, which can automate the deployment and maintenance of software configurations like the Google Cloud Observability agents: the Ops Agent, the legacy Monitoring agent, and the legacy Logging agent.

Supported operating systems

You can apply an agent policy to Compute Engine VM instances running the operating systems shown in the following table. In the table, the agent columns map to an agent type specified to the gcloud beta compute instances ops-agents policies create invocation:

  • Logging agent maps to policies with agent type logging.
  • Monitoring agent maps to policies with agent type metrics.
  • Ops Agent maps to policies with agent type ops-agent.

Operating system Logging agent Monitoring agent Ops Agent
CentOS 7
CentOS 8
Rocky Linux 8
RHEL 6
RHEL 7:
rhel-7, rhel-7-6-sap-ha, rhel-7-7-sap-ha, rhel-7-9-sap-ha
1
RHEL 8:
rhel-8, rhel-8-4-sap-ha, rhel-8-6-sap-ha, rhel-8-8-sap-ha
1
Debian 9 (Stretch)
Debian 10 (Buster)
Debian 11 (Bullseye)
Ubuntu LTS 18.04 (Bionic Beaver):
ubuntu-1804-lts, ubuntu-minimal-1804-lts
Ubuntu LTS 20.04 (Focal Fossa):
ubuntu-2004-lts, ubuntu-minimal-2004-lts
Ubuntu LTS 22.04 (Jammy Jellyfish):
ubuntu-2204-lts, ubuntu-minimal-2204-lts
SLES 12:
sles-12, sles-12-sp5-sap
SLES 15:
sles-15, sles-15-sp2-sap, sles-15-sp3-sap, sles-15-sp4-sap, sles-15-sp5-sap
OpenSUSE Leap 15:
opensuse-leap (opensuse-leap-15-3-*,
opensuse-leap-15-4-*)
Windows Server:
2016, 2019, 2022, Core 2016, Core 2019, Core 2022
1 The Monitoring agent is not supported on rhel-7-9-sap-ha, rhel-8-2-sap-ha, or rhel-8-4-sap-ha.

Create an agent policy

To create an agent policy by using the Google Cloud CLI, complete the following steps:

  1. If you haven't done so already, install the Google Cloud CLI.

    This document describes the beta command group for managing agent policies.

  2. If you haven't done so already, install the beta component of the gcloud CLI:

    gcloud components install beta
    

    To check if you have the beta component for the installed, run:

    gcloud components list
    

    If you previously installed the beta component, ensure you have the latest version:

    gcloud components update
    
  3. Use the following script to enable the APIs and to set the proper permissions for using the Google Cloud CLI: set-permissions.sh.

    For information about the script, refer to What's the set-permissions.sh script doing?.

  4. Use the gcloud beta compute instances ops-agents policies create command to create a policy. For the syntax of the command, see the gcloud beta compute instances ops-agents policies create documentation.

    For examples showing how to format the command, see the Examples section in the Google Cloud CLI documentation.

    For more information about the other commands in the command group and the available options, see the gcloud beta compute instances ops-agents policies documentation.

Best practices for using agent policies

To control the impact to production systems during rollout, we recommend that you use instance labels and zones to filter the instances that the policy applies to.

If you're creating a policy for the Ops Agent, ensure that your VMs don't have the legacy Logging agent or Monitoring agent installed on them. Running the Ops Agent and the legacy agents on the same VM can cause ingestion of duplicate logs or a conflict in metrics ingestion. If necessary, uninstall the Monitoring agent and uninstall the Logging agent before creating a policy to install the Ops Agent.

Here is an example of a phased rollout plan for CentOS 7 VMs in a project called my_project:

Phase 1: Create a policy named ops-agents-policy-safe-rollout to install the Ops Agent on all VMs with the labels env=test and app=myproduct.

gcloud beta compute instances \
    ops-agents policies create ops-agents-policy-safe-rollout \
    --agent-rules="type=ops-agent,version=current-major,package-state=installed,enable-autoupgrade=true" \
    --os-types=short-name=centos,version=7 \
    --group-labels=env=test,app=myproduct \
    --project=my_project

For more information about specifying the operating system, see gcloud beta compute instances ops-agents policies create.

Phase 2: Update that policy to target VMs in a single zone that have the labels env=prod and app=myproduct.

gcloud beta compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --group-labels=env=prod,app=myproduct \
    --zones=us-central1-c \

Phase 3: Update that policy to clear the zones filter so it rolls out globally

gcloud beta compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --clear-zones

Limitations

For a policy to take effect on VMs that predate OS Config, additional setup is needed to ensure the OS Config agent that the policy relies on is installed on the VMs. To install the OS Config agent on a fleet of VMs, complete the following steps:

  1. Ensure you have run the set-permissions.sh script in the Create an agent policy section.

  2. Identify the VMs on which you want to install the OS Config agent and list them in a CSV file. For example, to get a list of VMs that aren't managed by Google Kubernetes Engine, App Engine, or other Google Cloud services and then save it in a file called instances.csv, run the following command:

      gcloud compute instances list \
          --filter="-labels.list(show="keys"):goog-" \
          --format="csv(name,zone)" \
          | grep -v -x -F -f  <(gcloud compute instances os-inventory list-instances \
              --format="csv(name,zone)") \
          | sed 's/$/,update/' > instances.csv
    

    The grep section filters out the VMs that already have the OS Config agent installed and enabled. The VM-label exclusion based on goog- filters out Compute Engine VMs managed by GKE, App Engine, and other services.

    To further filter the instances by zones or labels, change the value of the --filter flag to something similar to the following:

      "-labels.list(show="keys"):goog- AND zone:(ZONE_1,ZONE_2) AND labels.KEY_1:VALUE_1 AND labels.KEY_2=VALUE_2"
    
  3. To install the OS Config agent on Linux VMs, download and run the mass-install-osconfig-agent.sh script.

    The following command installs the OS Config agent on the VMs specified in the instances.csv file in the specified project:

       bash mass-install-osconfig-agent.sh --project PROJECT_ID --input-file instances.csv
    

    For more information about using the script, see the comments in the script.

Troubleshooting

The ops-agents policy commands fail

When a gcloud beta compute instances ops-agents policies command fails, the response shows a validation error. Correct the errors by fixing the command arguments and flags as suggested by the error message.

In addition to the validation errors, you might see the following errors:

  • Insufficient IAM permission

    A sample error looks like:

    ERROR: (gcloud.beta.compute.instances.ops-agents.policies.command) PERMISSION_DENIED: Caller does not have required permission to command
    

    Make sure you run the set-permissions.sh script in the Create an agent policy section to set up the osconfig.guestPolicy specific IAM role.

    To verify whether you have the sufficient OS Config guest policy role enabled for the project, you can run the following command. In this example, the command checks if the user has the roles/osconfig.guestPolicyAdmin role. The GCLOUD_MEMBER value must be in the format of user:USER_EMAIL or serviceaccount:SERVICE_ACCOUNT_EMAIL.

    gcloud projects get-iam-policy PROJECT_ID \
        --filter=--member=GCLOUD_MEMBER \
        | grep "roles/osconfig.guestPolicyAdmin" -B 2
    

    The expected output is:

    - members:
      - GCLOUD_MEMBER
      role: roles/osconfig.guestPolicyAdmin
    
  • OS Config API is not enabled

    A sample error looks like:

    API [osconfig.googleapis.com] not enabled on project PROJECT_ID.
    Would you like to enable and retry (this will take a few minutes)?
    (y/N)?
    

    Make sure you run the set-permissions.sh script in the Create an agent policy section to grant all the necessary permissions.

    To verify whether the OS Config API is enabled for the project, you can run the following commands:

    gcloud services list --project PROJECT_ID \
        | grep osconfig.googleapis.com
    

    The expected output is:

    osconfig.googleapis.com    Cloud OS Config API
    
  • The policy does not exist

    A sample error looks like:

    NOT_FOUND: Requested entity was not found
    

    This suggests the policy has already been deleted. Make sure the policy ID in the describe, update or delete command maps to an existing policy.

The policy is created, but seems to have no effect

OS Config agents are deployed to each Compute Engine instance to manage the packages for the Logging and Monitoring agents. The policy may seem to have no effect if the underlying OS Config agent isn't installed.

LINUX

To verify that the OS Config agent is installed, run the following command:

gcloud compute ssh instance-id \
    --project project-id \
    -- sudo systemctl status google-osconfig-agent

A sample output is:

    google-osconfig-agent.service - Google OSConfig Agent
    Loaded: loaded (/lib/systemd/system/google-osconfig-agent.service; enabled; vendor preset:
    Active: active (running) since Wed 2020-01-15 00:14:22 UTC; 6min ago
    Main PID: 369 (google_osconfig)
     Tasks: 8 (limit: 4374)
    Memory: 102.7M
    CGroup: /system.slice/google-osconfig-agent.service
            └─369 /usr/bin/google_osconfig_agent

WINDOWS

To verify that the OS Config agent is installed, run the following steps:

  1. Connect to your instance using RDP or a similar tool and login to Windows.

  2. Open a PowerShell terminal, then run the following PowerShell command. You don't need administrator privileges.

    Get-Service google_osconfig_agent
    

A sample output is:

    Status   Name               DisplayName
    ------   ----               -----------
    Running  google_osconfig_a… Google OSConfig Agent

SUSE and Ubuntu Compute Engine instances don't have the OS Config agent preinstalled, so you need to follow the OS Config agent installation instructions to get the OS Config agent installed on those Compute Engine instances.

The OS Config agent is installed, but it does not install the Ops agents

To verify if there are any errors when the OS Config agent applies policies, you can check the OS Config agent's log. This can be done either by using Logs Explorer or using SSH or RDP to check individual Compute Engine instances.

To view OS Config agent logs in Logs Explorer, use the following filter:

resource.type="gce_instance"
logName="projects/PROJECT_ID/logs/OSConfigAgent"

To view OS Config agent logs by using SSH for individual Compute Engine Linux instances, run the following command:

  • CentOS / RHEL / SLES / SUSE

    gcloud compute ssh INSTANCE_ID \
        --project PROJECT_ID \
        -- sudo cat /var/log/messages \
           | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
    
  • Debian / Ubuntu

    gcloud compute ssh INSTANCE_ID \
        --project PROJECT_ID \
        -- sudo cat /var/log/syslog \
           | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
    

To view OS Config agent logs by using RDP for individual Compute Engine Windows instances, run the following steps:

  1. Connect to your instance using RDP or a similar tool and login to Windows.

  2. Open the Event Viewer app, under Windows Logs => Application, search for logs with Source equal to OSConfigAgent.

If there is an error connecting to the OS Config service, make sure you run the set-permissions.sh script in the Creating an agent policy section to set up the metadata.

To verify that the OS Config metadata is enabled, you can run the following command:

gcloud compute project-info describe \
    --project PROJECT_ID \
    | grep "enable-osconfig\|enable-guest-attributes" -A 1

The expected output is:

- key: enable-guest-attributes
  value: 'TRUE'
- key: enable-osconfig
  value: 'TRUE'

Observability agents are installed, but not functioning properly

For information about debugging specific agents, see the following documents:

Enable debug-level logs for the OS Config agent

It can be useful to enable debug-level logging in the OS Config agent when reporting an issue.

You can set the osconfig-log-level: debug metadata to enable debug-level logging for the OS Config agent. The collected logs have more information to help with the investigation.

To enable debug-level logging for the entire project, run the following command:

gcloud compute project-info add-metadata \
    --project PROJECT_ID \
    --metadata osconfig-log-level=debug

To enable debug-level logging for one VM, run the following command:

gcloud compute instances add-metadata INSTANCE_ID \
    --project PROJECT_ID \
    --metadata osconfig-log-level=debug

What's the set-permissions.sh script doing?

Given a project ID, an Identity and Access Management (IAM) role, and an email or a service account, the set-permissions.sh script performs the following actions:

  • Enables the Cloud Logging API, the Cloud Monitoring API, and the OS Config API for the project.

  • Grants the roles/logging.logWriter and the roles/monitoring.metricWriter roles to the Compute Engine default service account so that the agents can write logs and metrics to the Logging and Cloud Monitoring APIs.

  • Enables the OS Config metadata for the project so that OS Config agents get activated on the VMs.

  • Grants the specified IAM role to the gcloud user or the service account. Project owners have full access to create and manage a policy. For all other users or service accounts, project owners must grant one of the following roles:

    • roles/osconfig.guestPolicyAdmin: Provides full access to a policy.

    • roles/osconfig.guestPolicyEditor: Allows users to get, update, and list a policy.

    • roles/osconfig.guestPolicyViewer: Provides read-only access to get and list a policy.

    When running the script, you only need to specify the guestPolicy* part of the role name. The script supplies the roles/osconfig. part of the name.

The following invocation of the script enables the APIs, grants the necessary roles to the default service account, and enables the OS Config metadata:

bash set-permissions.sh --project=PROJECT_ID

To use the script to also grant one of the OS Config roles to a user who does not have the roles/owner (Owner) role on the project, run the script as follows:

bash set-permissions.sh --project=PROJECT_ID \
  --iam-user=USER_EMAIL \
  --iam-permission-role=guestPolicy[Admin|Editor|Viewer]

To use the script to also grant one of the OS Config roles to a non-default service account, run the script as follows:

bash set-permissions.sh --project=PROJECT_ID \
  --iam-service-account=SERVICE_ACCT_EMAIL \
  --iam-permission-role=guestPolicy[Admin|Editor|Viewer]

For more information, see the contents of the script.

What's the diagnose.sh script doing?

Given a project, a Compute Engine instance ID, and an Ops agent policy ID, the diagnose.sh script automatically collects the necessary information to help diagnose issues with the policy:

  • The OS Config agent version

  • The underlying OS Config guest policy

  • The policies that are applicable to this Compute Engine instance

  • The agent package repos that are pulled on to a Compute Engine instance

Terraform integration

Terraform support is built on top of the Google Cloud CLI commands. To create an agent policy using Terraform, follow the Terraform module instruction.