Troubleshooting the Agent

Contact us at google-cloud-ops-agent@google.com if you have any questions, need support, or would like to offer feedback.

This page helps you diagnose problems in the installation or running of the Ops Agent.

Agent fails to install

You may encounter the following errors when running the installation script.

  • The operating system is not supported. The error message might look similar to the following:

    https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-el6-x86_64-all/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    To address this issue please refer to the below wiki article
    
    https://wiki.centos.org/yum-errors
    
    If above article doesn't help to resolve this issue please use https://bugs.centos.org/.
    
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: google-cloud-ops-agent. Please verify its path and try again
    

    To fix this error, ensure you are using a supported operating system.

  • The VM already has the Cloud Logging agent or the Cloud Monitoring agent installed, and they conflict with the new agent. The error message might look similar to the following:

    Error:
     Problem: problem with installed package stackdriver-agent-6.0.5-1.el8.x86_64
      - package google-cloud-ops-agent-0.1.0-1.el8.x86_64 conflicts with stackdriver-agent provided by stackdriver-agent-6.0.5-1.el8.x86_64
    

    The Ops Agent uses new configuration files that are not compatible with the old agents. For more information, refer to the Configuring the agent guide.

    To fix this error, save the custom configuration files for the Cloud Monitoring agent and the Cloud Logging agent. Then, uninstall the old Cloud Monitoring agent and Cloud Logging agent.

Agent is installed but not running

systemd services not running

If the agent service is not running, you might see the following status:

$ sudo systemctl status google-cloud-ops-agent.target
● google-cloud-ops-agent.target - Google Cloud Ops Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.target; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2020-10-12 19:49:05 UTC; 3s ago

To fix this error, run the following command to start the service:

$ sudo systemctl start google-cloud-ops-agent.target

If the service fails to start, the configuration might be invalid.

Invalid configuration

If the configuration is invalid, you might see the following error when trying to start the agent service:

$ sudo systemctl restart google-cloud-ops-agent.target
A dependency job for google-cloud-ops-agent.target failed. See 'journalctl -xe' for details.

Use journalctl to get the exact error message:

$ journalctl -xe | grep generate_config
Oct 12 19:31:19 prod-vm generate_config[16132]: 2020/10/12 19:31:19 can't parse configuration: &{[%!w(string=line 15: field frogging not found in type confgenerator.unifiedConfig)]}

To fix the error, correct the invalid configuration found in the journalctl output. For reference, refer to the Configuring the agent guide.

Agent is running, but data is not ingested

Is the agent sending logs to Cloud Logging?

Check the local metrics

Thess steps require you to SSH into the VM.

  • Is the logging module running? Check the local uptime metrics to ensure that the logging module is running. For example:

    $curl -s localhost:2020/api/v1/uptime | jq | grep uptime_sec
    {
    "uptime_sec": 4132,
    }
    
  • Is the logging module reading the logs? Check the local input metrics to ensure logs are coming into the input.

    If logs are not coming into the input, then the log sources may not be generating logs. Check the log sources to ensure they are generating logs. Also ensure the file path is correct and not excluded by accident.

  • Is the logging module sending logs to Cloud Logging? Check the local output metrics and look for google or stackdriver. For example:

    $ curl -s localhost:2020/api/v1/metrics | jq
    {
      "input": {
        "tail.0": {
          "records": 210,
          "bytes": 17134,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.1": {
          "records": 1016,
          "bytes": 102460,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.2": {
          "records": 1918,
          "bytes": 245475,
          "files_opened": 2,
          "files_closed": 0,
          "files_rotated": 0
        },
        "storage_backlog.3": {
          "records": 0,
          "bytes": 0
        }
      },
      "filter": {},
      "output": {
        "stackdriver.0": {
          "proc_records": 1918,
          "proc_bytes": 245475,
          "errors": 0,
          "retries": 0,
          "retries_failed": 0
        }
      }
    }
    

Check the logging module log

This step requires you to SSH into the VM.

You can find the logging module logs at /var/log/google-cloud-ops-agent/subagents/*.log. If there are no logs, this indicates that the agent service is not running properly. Go to the Agent is installed but not running section first to fix that condition.

  • You might see 403 permission errors when writing to the Logging API. For example:

    [2020/10/13 18:55:09] [ warn] [output:stackdriver:stackdriver.0] error
    {
      "error": {
        "code": 403,
        "message": "Cloud Logging API has not been used in project 147627806769 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
        "status": "PERMISSION_DENIED",
        "details": [
          {
            "@type": "type.googleapis.com/google.rpc.Help",
            "links": [
              {
                "description": "Google developers console API activation",
                "url": "https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769"
              }
            ]
          }
        ]
      }
    }
    

    To fix this error, enable the Logging API and set the Logs Writer role.

  • You might see a quota issue for the Logging API. For example:

    error="8:Insufficient tokens for quota 'logging.googleapis.com/write_requests' and limit 'WriteRequestsPerMinutePerProject' of service 'logging.googleapis.com' for consumer 'project_number:648320274015'." error_code="8"
    

To fix this error, raise the quota or reduce the log throughput.

Is the agent sending metrics to Cloud Monitoring?

Check the metrics module log

This step requires you to SSH into the VM.

You can find the metrics module logs at /var/log/google-cloud-ops-agent/subagents/*.log. If there are no logs, this indicates that the agent service is not running properly. Go to the Agent is installed but not running section first to fix that condition.

  • You might see 403 permission errors when writing to the Monitoring API. This error occurs if the permission for the Ops Agent are not properly configured. For example:

    write_gcm: Unsuccessful HTTP request 403: {#012  "error": {#012    "code": 403,#012    "message": "Permission denied (or the resource may not exist).",#012    "status": "PERMISSION_DENIED",#012    "details": [#012      {#012        "@type": "type.googleapis.com/google.rpc.DebugInfo",#012        "detail": "Permission monitoring.timeSeries.create denied (or the resource may not exist)."#012      }#012    ]#012  }#012}}

    To fix this error, set the Monitoring Metric Writer role.

  • You might see a quota issue for the Monitoring API. For example:

    write_gcm: Unsuccessful HTTP request 429: {
    write_gcm: Error -2 from wg_curl_get_or_post
    write_gcm: wg_transmit_unique_segment failed.
    write_gcm: wg_transmit_unique_segments failed. Flushing.
    

    To fix this error, raise the quota or reduce the metrics throughput.

For other known issues with the Cloud Monitoring agent, refer to the Cloud Monitoring agent Troubleshooting guide.