Managing Agent Policies

Contact us at ops-agent-policy-feedback@google.com if you have any questions, need support or would like to offer feedback.

Agent Policies enable automated installation and maintenance of the Google Cloud's operations suite agents across a fleet of VMs that match user-specified criteria. With one command, you can create a Policy that governs existing and new VMs, ensuring proper installation and optional auto-upgrade of all agents.

Supported operating systems

You can apply an Agent Policy to Compute Engine instances with the following operating systems.

Logging agent maps to policies with agent type logging. Monitoring agent maps to policies with agent type metrics. Ops Agent maps to policies with agent type ops-agent.

Operating system Logging Agent Monitoring Agent version below 6.0.0 Monitoring Agent version 6.0.0 and higher Ops Agent
CentOS 6
CentOS 7
CentOS 8
RHEL 6
RHEL 7:
rhel-7, rhel-7-6-sap-ha, rhel-7-4-sap
RHEL 8: rhel-8
Debian 9 (Stretch)
Debian 10 (Buster)
SLES 12:
sles-12, sles-12-sp2-sap, sles-12-sp3-sap, sles-12-sp4-sap, sles-12-sp5-sap
SLES 15:
sles-15, sles-15-sp1-sap, sles-15-sap
Ubuntu LTS 16.04 (Xenial Xerus):
ubuntu-1604-lts, ubuntu-minimal-1604-lts
Ubuntu LTS 18.04 (Bionic Beaver):
ubuntu-1804-lts, ubuntu-minimal-1804-lts
Ubuntu 19.10 (Eoan Ermine):
ubuntu-1910, ubuntu-minimal-1910
Ubuntu LTS 20.04 (Focal Fossa):
ubuntu-2004-lts, ubuntu-minimal-2004-lts

Creating an Agent Policy

To create an Agent Policy using the gcloud command-line tool, complete the following steps:

  1. If you haven't done so already, install the Cloud SDK.

    In Cloud SDK, the command group for managing Agent Policies is in alpha release.

    1. To check if you have the alpha component for the gcloud tool installed, run this command:

      gcloud components list
      
    2. If you don't have the alpha component installed, run this command to install it:

      gcloud components install alpha
      
  2. Use the following script to enable the APIs and to set the proper permissions for using the gcloud command-line tool: set-permissions.sh.

    For information about the script, refer to What's the set-permissions.sh script doing?.

  3. Use the gcloud alpha compute instances ops-agents policies create command to create a Policy. For the syntax of the command, refer to the gcloud alpha compute instances ops-agents policies create documentation.

    For examples of how to format the command, refer to the Examples section in the documentation.

    For more information about the available gcloud tool commands and the available options, refer to the gcloud alpha compute instances ops-agents policies documentation.

Best practices for using Agent Policies

To control the impact to production systems during rollout, it's recommended to use instance labels and zones to filter the instances that the Policy applies to.

Here is an example of a phased rollout plan for CentOS 7 VMs:

Phase 1: Create a policy to target all VMs with the label env=test and app=myproduct.

gcloud alpha compute instances \
    ops-agents policies create ops-agents-policy-safe-rollout \
    --agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
    --os-types=short-name=centos,version=7 \
    --group-labels=env=test,app=myproduct

Phase 2: Update that policy to target env=prod and app=myproduct and only a single zone.

gcloud alpha compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --group-labels=env=prod,app=myproduct \
    --zones=us-central1-c

Phase 3: Update that policy to clear the zones filter so it rolls out globally

gcloud alpha compute instances \
    ops-agents policies update ops-agents-policy-safe-rollout \
    --clear-zones

Limitations

For a Policy to take effect on Ubuntu and SLES OS distros or VMs that predate OS Config, additional setup is needed to ensure the OS Config Agent that the policy relies on is installed on the VMs. To install the OS Config Agent on a fleet of VMs, complete the following steps:

  1. Ensure you have run the set-permissions.sh script in the Creating an Agent Policy section.

  2. Decide on which VMs you want to install the OS Config Agent and list them in a CSV file.

    To get a list of all the non Google-managed (e.g. by Google Kubernetes Engine or Google App Engine) instances, run:

      gcloud compute instances list \
          --filter="-labels.list(show="keys"):goog-" \
          --format="csv(name,zone)" \
          | grep -v -x -F -f  <(gcloud compute instances os-inventory list-instances \
              --format="csv(name,zone)")
    

    The grep section filters out the VMs that already have the OS Config Agent installed and enabled.

    To further filter the instances by zones or labels, change the --filter to something similar to the following:

      "-labels.list(show="keys"):goog- AND zone:(ZONE_1,ZONE_2) AND labels.KEY_1:VALUE_1 AND labels.KEY_2=VALUE_2"
    
  3. Run the mass-install-osconfig-agent.sh script by following the instructions in the script. This script automates the Installing the OS Config agent instructions.

Troubleshooting

The ops-agents policy commands fail

If ops-agents policy commands fail, they show a corresponding validation error. Correct those errors by fixing the command arguments and flags as suggested by the error message.

In addition to the validation errors, you might see the following errors:

  • Insufficient IAM permission

    A sample error looks like:

    ERROR: (gcloud.alpha.compute.instances.ops-agents.policies.XXX) PERMISSION_DENIED: Caller does not have required permission to XXX
    

    Make sure you run the set-permissions.sh script in the Creating an Agent Policy section to set up the osconfig.guestPolicy specific IAM role.

    To verify whether you have the sufficient OS Config guest policy role enabled for the project, you can run the following command. In this example, the command checks if the user has the roles/osconfig.guestPolicyAdmin role. The GCLOUD_MEMBER should be in the format of user:USER_EMAIL or serviceaccount:SERVICE_ACCOUNT_EMAIL.

    gcloud projects get-iam-policy project-id \
        --filter=--member=gcloud-member \
        | grep "roles/osconfig.guestPolicyAdmin" -B 2
    

    The expected output is:

    - members:
      - gcloud-member
      role: roles/osconfig.guestPolicyAdmin
    
  • Osconfig API is not enabled

    A sample error looks like:

    API [osconfig.googleapis.com] not enabled on project [XXX].
    Would you like to enable and retry (this will take a few minutes)?
    (y/N)?
    

    Make sure you run the set-permissions.sh script in the Creating an Agent Policy section to grant all the necessary permissions.

    To verify whether OS Config API is enabled for the project, you can run the following commands:

    gcloud services list --project project-id \
        | grep osconfig.googleapis.com
    

    The expected output is:

    osconfig.googleapis.com    Cloud OS Config API
    

The policy is created, but seems to have no effect

OS Config agents are deployed to each Compute Engine instance to manage the packages for the Logging and Monitoring agents. The policy may seem to have no effect if the underlying OS Config agent is not installed. To verify that the OS Config agent is installed, run the following command.

gcloud compute ssh instance-id \
    --project project-id \
    -- sudo systemctl status google-osconfig-agent

A sample output is:

google-osconfig-agent.service - Google OSConfig Agent
Loaded: loaded (/lib/systemd/system/google-osconfig-agent.service; enabled; vendor preset:
Active: active (running) since Wed 2020-01-15 00:14:22 UTC; 6min ago
Main PID: 369 (google_osconfig)
 Tasks: 8 (limit: 4374)
Memory: 102.7M
CGroup: /system.slice/google-osconfig-agent.service
        └─369 /usr/bin/google_osconfig_agent

SUSE and Ubuntu Compute Engine instances don't have the OS Config agent preinstalled, so you need to follow the OS Config agent installation instructions to get the OS Config agent installed on those Compute Engine instances.

The OS Config agent is installed, but it does not install the Ops agents

To verify if there are any errors when the OS Config agent applies Policies, you can check the OS Config agent's log. This can be done either via Logs Viewer or via SSH into individual Compute Engine instances.

To view OS Config agent logs in Logs Viewer, use the following filter:

resource.type="gce_instance"
logName="projects/project-id/logs/OSConfigAgent"

To view OS Config agent logs via SSH for individual Compute Engine instances, run the following command:

  • CentOS / RHEL / SLES / SUSE

    gcloud compute ssh instance-id \
        --project project-id \
        -- sudo cat /var/log/messages \
           | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
    
  • Debian / Ubuntu

    gcloud compute ssh instance-id \
        --project project-id \
        -- sudo cat /var/log/syslog \
           | grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
    

If there is an error connecting to the OS Config Service, make sure you run the set-permissions.sh script in the Creating an Agent Policy section to set up the metadata.

To verify that the OS Config metadata is enabled, you can run the following command:

gcloud compute project-info describe \
    --project project-id \
    | grep "enable-osconfig\|enable-guest-attributes" -A 1

The expected output is:

- key: enable-guest-attributes
  value: 'TRUE'
- key: enable-osconfig
  value: 'TRUE'

Ops agents are installed, but not functioning properly

Refer to the Logging agent and the Monitoring agent troubleshooting pages to debug specific issues.

Providing feedback

Contact us at ops-agent-policy-feedback@google.com if you have any questions, need support or would like to offer feedback. You can also use the gcloud feedback command to report bugs and issues that you ran into when using the commands.

Use the following script to gather information that can help troubleshoot the issue: diagnose.sh.

For information about the script, refer to What's the diagnose.sh script doing?.

Enabling debug-level logs

It's very helpful to enable debug level logging of the OS Config agent when reporting an issue.

You can set the osconfig-log-level: debug metadata to enable debug-level logging for the OS Config agent. The collected logs have more information to help with the investigation.

To enable debug-level logging for the entire project, run the following command:

gcloud compute project-info add-metadata \
    --project project-id \
    --metadata osconfig-log-level=debug

To enable debug-level logging for one VM, run the following command:

gcloud compute instances add-metadata instance-id \
    --project project-id \
    --metadata osconfig-log-level=debug

Additional information

What's the set-permissions.sh script doing?

Given a project ID, an Identity and Access Management (IAM) role, and an email or a service account, the set-permissions.sh script performs the following actions:

  • Enables the Cloud Logging API, the Cloud Monitoring API, and the OS Config API for the project.

  • Grants the roles/logging.logWriter and the roles/monitoring.metricWriter roles to the Compute Engine default service account so that the agents can write logs and metrics to the Logging and Cloud Monitoring APIs.

  • Enables the OS Config metadata for the project so that OS Config agents get activated on the VMs.

  • Grants the specified IAM role to the gcloud user or the service account. Project owners have full access to create and manage a Policy. For all other users or service accounts, project owners must grant one of the following roles:

    • roles/osconfig.guestPolicyAdmin: Provides full access to a Policy.

    • roles/osconfig.guestPolicyEditor: Allows users to get, update, and list a Policy.

    • roles/osconfig.guestPolicyViewer: Provides read-only access to get and list a Policy.

See example usage in the comments of the script.

What's the diagnose.sh script doing?

Given a project, a Compute Engine instance ID, and an Ops agent Policy ID, the diagnose.sh script automatically collects the necessary information to help diagnosing issues of the policy:

  • The OS Config agent version

  • The underlying OS Config guest policy

  • The Policies that are applicable to this Compute Engine instance

  • The agent package repos that are pulled on to a Compute Engine instance