Troubleshooting the Agent

Contact us at google-cloud-ops-agent@google.com if you have any questions, need support, or would like to offer feedback.

This page helps you diagnose problems in the installation or running of the Ops Agent.

Agent fails to install

You may encounter the following errors when running the installation script.

  • The operating system is not supported. The error message might look similar to the following:

    Linux

    https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-el6-x86_64-all/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    To address this issue please refer to the below wiki article
    
    https://wiki.centos.org/yum-errors
    
    If above article doesn't help to resolve this issue please use https://bugs.centos.org/.
    
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: google-cloud-ops-agent. Please verify its path and try again
    
  • The VM already has the Cloud Logging agent or the Cloud Monitoring agent installed, and they conflict with the new agent. The error message might look similar to the following:

    Linux

    Error:
     Problem: problem with installed package stackdriver-agent-6.0.5-1.el8.x86_64
      - package google-cloud-ops-agent-0.1.0-1.el8.x86_64 conflicts with stackdriver-agent provided by stackdriver-agent-6.0.5-1.el8.x86_64
    

    The Ops Agent uses new configuration files that are not compatible with the old agents. For more information, refer to the Configuring the agent guide.

    To fix this error, save the custom configuration files for the Cloud Monitoring agent and the Cloud Logging agent. Then, uninstall the old Cloud Monitoring agent and Cloud Logging agent.

Agent is installed but not running

Agent services not running

If the agent service is not running, you might see the following status:

Linux

$ sudo systemctl status google-cloud-ops-agent.target
● google-cloud-ops-agent.target - Google Cloud Ops Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.target; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2020-10-12 19:49:05 UTC; 3s ago

Windows

Get-Service google-cloud-ops-agent

Status   Name                    DisplayName
------   ----                    -----------
Stopped  google-cloud-ops-agent  Google Cloud Ops Agent

To fix this error, run the following command to start the service:

Linux

$ sudo systemctl start google-cloud-ops-agent.target

Windows

Start-Service google-cloud-ops-agent

If the service fails to start, the configuration might be invalid.

Conflict with currently installed agents

  • The VM already has the Cloud Logging agent or the Cloud Monitoring agent installed, and their configuration conflicts with the new agent's configuration. The error message might look similar to the following:

    Windows

    We detected an existing Windows service for the StackdriverLogging agent,
    which is not compatible with the Ops Agent when the Ops Agent configuration
    has a non-empty logging section. Please either remove the logging section
    from the Ops Agent configuration, or disable the StackdriverLogging agent,
    and then retry enabling the Ops Agent.
    

    To fix this error, you have two options:

    1. Disable the conflicting section of the Ops Agent configuration file. For more information, refer to the Configuring the agent guide.

    2. Disable the conflicting Cloud Logging agent or the Cloud Monitoring agent.

      1. Save any custom configuration files for the Cloud Logging agent.
      2. Uninstall the old Cloud Monitoring agent and Cloud Logging agent.

Invalid configuration

If the configuration is invalid, you might see the following error when trying to start the agent service:

Linux

$ sudo systemctl restart google-cloud-ops-agent.target
A dependency job for google-cloud-ops-agent.target failed. See 'journalctl -xe' for details.

Use journalctl to get the exact error message:

$ journalctl -xe | grep generate_config
Oct 12 19:31:19 prod-vm generate_config[16132]: 2020/10/12 19:31:19 can't parse configuration: &{[%!w(string=line 15: field frogging not found in type confgenerator.unifiedConfig)]}

Windows

failed to generate config files: can't parse configuration: yaml: line 20: could not find expected ':'

To fix the error, correct the invalid configuration and restart the agent. For reference, refer to the Configuring the agent guide.

Agent is running, but data is not ingested

Is the agent sending logs to Cloud Logging?

Check the local metrics

These steps require you to SSH into the VM.

  • Is the logging module running? Check the local uptime metrics to ensure that the logging module is running. For example:

    $ curl -s localhost:2020/api/v1/uptime | jq | grep uptime_sec
      "uptime_sec": 4132,
    
  • Is the logging module reading the logs? Check the local input metrics to ensure logs are coming into the input.

    If logs are not coming into the input, then the log sources may not be generating logs. Check the log sources to ensure they are generating logs. Also ensure the file path is correct and not excluded by accident.

    $ curl -s localhost:2020/api/v1/metrics | jq
    {
      "input": {
        "tail.0": {
          "records": 210,
          "bytes": 17134,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.1": {
          "records": 1016,
          "bytes": 102460,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.2": {
          "records": 1918,
          "bytes": 245475,
          "files_opened": 2,
          "files_closed": 0,
          "files_rotated": 0
        },
        "storage_backlog.3": {
          "records": 0,
          "bytes": 0
        }
      },
      "filter": ...
      "output": ...
    }
    
  • Is the logging module sending logs to Cloud Logging? Check the local output metrics and look for google or stackdriver. For example:

    $ curl -s localhost:2020/api/v1/metrics | jq
    {
      "input": ...
      "filter": ...
      "output": {
        "stackdriver.0": {
          "proc_records": 1918,
          "proc_bytes": 245475,
          "errors": 0,
          "retries": 0,
          "retries_failed": 0
        }
      }
    }
    

Check the logging module log

This step requires you to SSH into the VM.

You can find the logging module logs at /var/log/google-cloud-ops-agent/subagents/*.log. If there are no logs, this indicates that the agent service is not running properly. Go to the Agent is installed but not running section first to fix that condition.

  • You might see 403 permission errors when writing to the Logging API. For example:

    [2020/10/13 18:55:09] [ warn] [output:stackdriver:stackdriver.0] error
    {
      "error": {
        "code": 403,
        "message": "Cloud Logging API has not been used in project 147627806769 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
        "status": "PERMISSION_DENIED",
        "details": [
          {
            "@type": "type.googleapis.com/google.rpc.Help",
            "links": [
              {
                "description": "Google developers console API activation",
                "url": "https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769"
              }
            ]
          }
        ]
      }
    }
    

    To fix this error, enable the Logging API and set the Logs Writer role.

  • You might see a quota issue for the Logging API. For example:

    error="8:Insufficient tokens for quota 'logging.googleapis.com/write_requests' and limit 'WriteRequestsPerMinutePerProject' of service 'logging.googleapis.com' for consumer 'project_number:648320274015'." error_code="8"
    

To fix this error, raise the quota or reduce the log throughput.

Is the agent sending metrics to Cloud Monitoring?

Check the metrics module log

This step requires you to SSH into the VM.

You can find the metrics module logs at /var/log/google-cloud-ops-agent/subagents/*.log. If there are no logs, this indicates that the agent service is not running properly. Go to the Agent is installed but not running section first to fix that condition.

  • You might see 403 permission errors when writing to the Monitoring API. This error occurs if the permission for the Ops Agent are not properly configured. For example:

    write_gcm: Unsuccessful HTTP request 403: {#012  "error": {#012    "code": 403,#012    "message": "Permission denied (or the resource may not exist).",#012    "status": "PERMISSION_DENIED",#012    "details": [#012      {#012        "@type": "type.googleapis.com/google.rpc.DebugInfo",#012        "detail": "Permission monitoring.timeSeries.create denied (or the resource may not exist)."#012      }#012    ]#012  }#012}}

    To fix this error, set the Monitoring Metric Writer role.

  • You might see a quota issue for the Monitoring API. For example:

    write_gcm: Unsuccessful HTTP request 429: {
    write_gcm: Error -2 from wg_curl_get_or_post
    write_gcm: wg_transmit_unique_segment failed.
    write_gcm: wg_transmit_unique_segments failed. Flushing.
    

    To fix this error, raise the quota or reduce the metrics throughput.

For other known issues with the Cloud Monitoring agent, refer to the Cloud Monitoring agent Troubleshooting guide.