排查代理问题

本页面可帮助您诊断 Ops Agent 安装或运行中出现的问题。

代理无法安装

您在运行安装脚本时可能会遇到以下错误。

  • 操作系统不受支持。错误消息可能类似于以下内容:

    Linux

    https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-el6-x86_64-all/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
    Trying other mirror.
    To address this issue please refer to the below wiki article
    
    https://wiki.centos.org/yum-errors
    
    If above article doesn't help to resolve this issue please use https://bugs.centos.org/.
    
    Error: Cannot retrieve repository metadata (repomd.xml) for repository: google-cloud-ops-agent. Please verify its path and try again
    
  • 虚拟机已安装 Cloud Logging 代理Cloud Monitoring 代理,它们与新代理冲突。错误消息可能类似于以下内容:

    Linux

    Error:
    Problem: problem with installed package stackdriver-agent-6.0.5-1.el8.x86_64 - package google-cloud-ops-agent-0.1.0-1.el8.x86_64 conflicts with stackdriver-agent provided by stackdriver-agent-6.0.5-1.el8.x86_64
    

    Ops Agent 会使用与旧代理不兼容的新配置文件。如需了解详情,请参阅配置代理指南。

    要消除此错误,请执行以下操作:

    1. 保存 Cloud Monitoring 代理Cloud Logging 代理的自定义配置文件。

    2. 卸载旧的 Cloud Monitoring 代理Cloud Logging 代理

      卸载代理后,Google Cloud Console 最多可能需要一小时才能报告此更改。

代理已安装,但无法运行

代理服务未在运行

当代理服务按预期运行时,您可能会看到以下状态:

对于 Linux

computer@debian9:~$ sudo systemctl status google-cloud-ops-agent"*"
● google-cloud-ops-agent.service - Google Cloud Ops Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
   Active: active (exited) since Thu 2021-08-05 20:33:44 UTC; 7s ago
  Process: 2240 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
  Process: 2214 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -in /etc/google-cloud-ops-agent/config.yaml (code=exited, status=0/SUCCESS)
 Main PID: 2240 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/google-cloud-ops-agent.service

Aug 05 20:33:44 debian9 systemd[1]: Starting Google Cloud Ops Agent...
Aug 05 20:33:44 debian9 systemd[1]: Started Google Cloud Ops Agent.

● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; vendor preset: enabled)
  Drop-In: /lib/systemd/system/google-cloud-ops-agent-fluent-bit.service.d
           └─directories.conf
   Active: active (running) since Thu 2021-08-05 20:33:44 UTC; 7s ago
  Process: 2234 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
  Process: 2216 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} -state ${STATE_DIRECTORY} (code=exited, status=0/SUCCESS)
 Main PID: 2247 (fluent-bit)
    Tasks: 22 (limit: 4915)
   CGroup: /system.slice/google-cloud-ops-agent-fluent-bit.service
           └─2247 /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config /run/google-cloud-ops-agent-fluent-bit/fluent_bit_main.conf --parser /run/google-cloud-ops-agent-fluent-bit/fluent_bit_parser.conf --log_file /var/log/google-cloud-ops-agent/subagents/logging-module.log --storage_path /var/lib/google-cloud-ops-agent/fluent-bit/buffers

Aug 05 20:33:44 debian9 systemd[1]: Starting Google Cloud Ops Agent - Logging Agent...
Aug 05 20:33:44 debian9 systemd[1]: Started Google Cloud Ops Agent - Logging Agent.
Aug 05 20:33:44 debian9 fluent-bit[2247]: Fluent Bit v1.7.8
Aug 05 20:33:44 debian9 fluent-bit[2247]: * Copyright (C) 2019-2021 The Fluent Bit Authors
Aug 05 20:33:44 debian9 fluent-bit[2247]: * Copyright (C) 2015-2018 Treasure Data
Aug 05 20:33:44 debian9 fluent-bit[2247]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Aug 05 20:33:44 debian9 fluent-bit[2247]: * https://fluentbit.io

● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor preset: enabled)
  Drop-In: /lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service.d
           └─directories.conf
   Active: active (running) since Thu 2021-08-05 20:33:44 UTC; 7s ago
  Process: 2237 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
  Process: 2215 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
 Main PID: 2251 (otelopscol)
    Tasks: 6 (limit: 4915)
   CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service
           └─2251 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --add-instance-id=false --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml

Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.234Z        info        builder/pipelines_builder.go:51        Pipeline is starting...        {"pipeline_name": "metrics/system", "pipeline_datatype": "metrics"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.234Z        info        builder/pipelines_builder.go:62        Pipeline is started.        {"pipeline_name": "metrics/system", "pipeline_datatype": "metrics"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.234Z        info        service/service.go:192        Starting receivers...
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.235Z        info        builder/receivers_builder.go:70        Receiver is starting...        {"kind": "receiver", "name": "hostmetrics/hostmetrics"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.235Z        info        builder/receivers_builder.go:75        Receiver started.        {"kind": "receiver", "name": "hostmetrics/hostmetrics"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z        info        builder/receivers_builder.go:70        Receiver is starting...        {"kind": "receiver", "name": "prometheus/agent"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z        info        discovery/manager.go:195        Starting provider        {"kind": "receiver", "name": "prometheus/agent", "level": "debug", "provider": "static/0", "subs": "[otel-collector]"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z        info        builder/receivers_builder.go:75        Receiver started.        {"kind": "receiver", "name": "prometheus/agent"}
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.236Z        info        service/collector.go:182        Everything is ready. Begin running and processing data.
Aug 05 20:33:45 debian9 otelopscol[2251]: 2021-08-05T20:33:45.256Z        info        discovery/manager.go:213        Discoverer channel closed        {"kind": "receiver", "name": "prometheus/agent", "level": "debug", "provider": "static/0"}

对于 Windows

Get-Service google-cloud-ops-agent*

Status   Name               DisplayName
------   ----               -----------
Running  google-cloud-op... Google Cloud Ops Agent
Running  google-cloud-op... Google Cloud Ops Agent - Logging Agent
Running  google-cloud-op... Google Cloud Ops Agent - Metrics Agent

如果代理服务无法运行,您可能会看到以下状态:

Linux

$ sudo service google-cloud-ops-agent status
● google-cloud-ops-agent.service - Google Cloud Ops Agent
   Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2021-06-30 21:20:43 UTC; 6s ago

Windows

Get-Service google-cloud-ops-agent

Status   Name                    DisplayName
------   ----                    -----------
Stopped  google-cloud-ops-agent  Google Cloud Ops Agent

如需消除此错误,请运行以下命令来启动服务:

Linux

$ sudo service google-cloud-ops-agent start

Windows

Start-Service google-cloud-ops-agent

如果服务无法启动,则说明配置可能无效。

与当前已安装的代理冲突

  • 虚拟机已安装 Cloud Logging 代理Cloud Monitoring 代理,并且其配置与新代理的配置冲突。错误消息可能类似于以下内容:

    Windows

    We detected an existing Windows service for the StackdriverLogging agent,
    which is not compatible with the Ops Agent when the Ops Agent configuration
    has a non-empty logging section. Please either remove the logging section
    from the Ops Agent configuration, or disable the StackdriverLogging agent,
    and then retry enabling the Ops Agent.
    

    如需修复此错误,您有以下两种选择:

    1. 停用 Ops Agent 配置文件的冲突部分。如需了解详情,请参阅配置代理指南。

    2. 停用有冲突的 Cloud Logging 代理Cloud Monitoring 代理

      1. 保存 Cloud Logging 代理的所有自定义配置文件。
      2. 卸载旧的 Cloud Monitoring 代理Cloud Logging 代理

      卸载代理后,Google Cloud Console 最多可能需要一小时才能报告此更改。

配置无效

如果配置无效,您可能会在尝试重启代理服务时看到以下错误:

Linux

$ sudo service google-cloud-ops-agent restart \
    && sudo service google-cloud-ops-agent status
● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging Agent
   Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service.d
           └─directories.conf
   Active: failed (Result: exit-code) since Wed 2021-06-30 22:21:08 UTC; 2s ago
  Process: 1141421 ExecStart=/opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config ${RUNTIME_DIRECTORY}/fluent_bit_main.conf --parser ${RUNTIME_DIRECTORY}/fluent_bit_parser.conf --log_>
  Process: 1141847 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} -state ${STATE_DIR>
 Main PID: 1141421 (code=exited, status=0/SUCCESS)

Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Control process exited, code=exited status=1
Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Failed with result 'exit-code'.
Jun 30 22:21:08 centos8-2 systemd[1]: Failed to start Google Cloud Ops Agent - Logging Agent.
Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Service RestartSec=100ms expired, scheduling restart.
Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Scheduled restart job, restart counter is at 5.
Jun 30 22:21:08 centos8-2 systemd[1]: Stopped Google Cloud Ops Agent - Logging Agent.
Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Start request repeated too quickly.
Jun 30 22:21:08 centos8-2 systemd[1]: google-cloud-ops-agent-fluent-bit.service: Failed with result 'exit-code'.
Jun 30 22:21:08 centos8-2 systemd[1]: Failed to start Google Cloud Ops Agent - Logging Agent.

使用 journalctl 获取确切的错误消息:

$ sudo journalctl -xe | grep "google_cloud_ops_agent_engine"

您可能会看到类似如下内容的消息:

Jun 30 22:00:26 centos8-2 google_cloud_ops_agent_engine[1141491]: 2021/06/30 22:00:26 the agent config file is not valid YAML. detailed error: yaml: line 21: did not find expected key

Windows

failed to generate config files: can't parse configuration: yaml: line 20: could not find expected ':'

如需修复该错误,请更正无效配置并重启代理。如需了解参考信息,请参阅配置代理指南。

代理在运行,但无法提取数据

代理正在向 Cloud Logging 发送日志吗?

检查本地指标

以下步骤要求您通过 SSH 连接到虚拟机。

  • 日志记录模块正在运行吗?检查本地正常运行时间指标,确保日志记录模块正在运行。例如:

    $ curl -s localhost:2020/api/v1/uptime | jq | grep uptime_sec
    

    您可能会看到类似如下内容的消息:

    "uptime_sec": 4132,
    
  • 日志记录模块正在读取日志吗?检查本地输入指标,确保日志进入 input

    如果日志未进入 input,则说明日志源可能没有生成日志。检查日志源,确保它们正在生成日志。同时,确保文件路径正确无误并且没有被意外排除。

    $ curl -s localhost:2020/api/v1/metrics | jq
    {
      "input": {
        "tail.0": {
          "records": 210,
          "bytes": 17134,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.1": {
          "records": 1016,
          "bytes": 102460,
          "files_opened": 1,
          "files_closed": 0,
          "files_rotated": 0
        },
        "tail.2": {
          "records": 1918,
          "bytes": 245475,
          "files_opened": 2,
          "files_closed": 0,
          "files_rotated": 0
        },
        "storage_backlog.3": {
          "records": 0,
          "bytes": 0
        }
      },
      "filter": ...
      "output": ...
    }
    
  • 日志记录模块正在向 Cloud Logging 发送日志吗?检查本地输出指标,并查找 googlestackdriver。例如:

    $ curl -s localhost:2020/api/v1/metrics | jq
    {
      "input": ...
      "filter": ...
      "output": {
        "stackdriver.0": {
          "proc_records": 1918,
          "proc_bytes": 245475,
          "errors": 0,
          "retries": 0,
          "retries_failed": 0
        }
      }
    }
    

检查日志记录模块日志

此步骤要求您通过 SSH 连接到虚拟机。

您可以在 /var/log/google-cloud-ops-agent/subagents/*.log(对于 Linux)和 C:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log(对于 Windows)中找到日志记录模块日志。如果没有日志,则说明代理服务未正常运行。请先转到“代理已安装,但无法运行”部分,以消除该状况。

  • 写入 Logging API 时,您可能会看到 403 权限错误。例如:

    [2020/10/13 18:55:09] [ warn] [output:stackdriver:stackdriver.0] error
    {
    "error": {
      "code": 403,
      "message": "Cloud Logging API has not been used in project 147627806769 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
      "status": "PERMISSION_DENIED",
      "details": [
        {
          "@type": "type.googleapis.com/google.rpc.Help",
          "links": [
            {
              "description": "Google developers console API activation",
              "url": "https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769"
            }
          ]
        }
      ]
    }
    }
    

    如需消除此错误,请启用 Logging API 并设置 Logs Writer 角色。

  • 您可能会看到 Logging API 的配额问题。例如:

    error="8:Insufficient tokens for quota 'logging.googleapis.com/write_requests' and limit 'WriteRequestsPerMinutePerProject' of service 'logging.googleapis.com' for consumer 'project_number:648320274015'." error_code="8"
    

如需消除此错误,请增加配额或减少日志吞吐量。

代理正在向 Cloud Monitoring 发送指标吗?

检查指标模块日志

此步骤要求您通过 SSH 连接到虚拟机。

您可以在 syslog 中找到指标模块日志。如果没有日志,则说明代理服务未正常运行。请先转到“代理已安装,但无法运行”部分,以消除该状况。

  • 写入 Monitoring API 时,您可能会看到 PermissionDenied 错误。如果 Ops Agent 的权限未正确配置,则会出现此错误。例如:

    Nov  2 14:51:27 test-ops-agent-error otelopscol[412]: 2021-11-02T14:51:27.343Z#011info#011exporterhelper/queued_retry.go:231#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "[rpc error: code = PermissionDenied desc = Permission monitoring.timeSeries.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.timeSeries.create denied (or the resource may not exist).]", "interval": "6.934781228s"}
    

    如需消除此错误,请设置 Monitoring Metric Writer 角色。

  • 写入 Monitoring API 时,您可能会看到 ResourceExhausted 错误。如果项目达到任何 Monitoring API 配额上限,则会出现此错误。例如:

    Nov  2 18:48:32 test-ops-agent-error otelopscol[441]: 2021-11-02T18:48:32.175Z#011info#011exporterhelper/queued_retry.go:231#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = ResourceExhausted desc = Quota exceeded for quota metric 'Total requests' and limit 'Total requests per minute per user' of service 'monitoring.googleapis.com' for consumer 'project_number:8563942476'.\nerror details: name = ErrorInfo reason = RATE_LIMIT_EXCEEDED domain = googleapis.com metadata = map[consumer:projects/8563942476 quota_limit:DefaultRequestsPerMinutePerUser quota_metric:monitoring.googleapis.com/default_requests service:monitoring.googleapis.com]", "interval": "2.641515416s"}
    

    如需消除此错误,请增加配额或减少指标吞吐量。

非有害日志

以下日志是可以安全忽略的非有害日志垃圾内容的示例。

  • 从伪进程或受限进程中抓取指标时出错

    Jul 13 17:28:55 debian9-trouble otelopscol[2134]: 2021-07-13T17:28:55.848Z        error        scraperhelper/scrapercontroller.go:205        Error scraping metrics        {"kind"
    : "receiver", "name": "hostmetrics/hostmetrics", "error": "[error reading process name for pid 2: readlink /proc/2/exe: no such file or directory; error reading process name for
    pid 3: readlink /proc/3/exe: no such file or directory; error reading process name for pid 4: readlink /proc/4/exe: no such file or directory; error reading process name for pid
    5: readlink /proc/5/exe: no such file or directory; error reading process name for pid 6: readlink /proc/6/exe: no such file or directory; error reading process name for pid 7: r
    eadlink /proc/7/exe: no such file or directory; error reading process name for pid 8: readlink /proc/8/exe: no such file or directory; error reading process name for pid 9: readl
    ink /proc/9/exe: no such file or directory; error reading process name for pid 10: readlink /proc/10/exe: no such file or directory; error reading process name for pid 11: readli
    nk /proc/11/exe: no such file or directory; error reading process name for pid 12: readlink /proc/12/exe: no such file or directory; error reading process name for pid 13: readli
    nk /proc/13/exe: no such file or directory; error reading process name for pid 14: readlink /proc/14/exe: no such file or directory; error reading process name for pid 15: readli
    nk /proc/15/exe: no such file or directory; error reading process name for pid 16: readlink /proc/16/exe: no such file or directory; error reading process name for pid 17: readli
    nk /proc/17/exe: no such file or directory; error reading process name for pid 18: readlink /proc/18/exe: no such file or directory; error reading process name for pid 19: readli
    nk /proc/19/exe: no such file or directory; error reading process name for pid 20: readlink /proc/20/exe: no such file or directory; error reading process name for pid 21: readli
    nk /proc/21/exe: no such file or directory; error reading process name for pid 22: readlink /proc/22/exe: no such file or directory; error reading process name for pid
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]: 23: readlink /proc/23/exe: no such file or directory; error reading process name for pid 24: readlink /proc/24/exe: no such file
    or directory; error reading process name for pid 25: readlink /proc/25/exe: no such file or directory; error reading process name for pid 26: readlink /proc/26/exe: no such file
    or directory; error reading process name for pid 27: readlink /proc/27/exe: no such file or directory; error reading process name for pid 28: readlink /proc/28/exe: no such file
    or directory; error reading process name for pid 30: readlink /proc/30/exe: no such file or directory; error reading process name for pid 31: readlink /proc/31/exe: no such file
    or directory; error reading process name for pid 43: readlink /proc/43/exe: no such file or directory; error reading process name for pid 44: readlink /proc/44/exe: no such file
    or directory; error reading process name for pid 45: readlink /proc/45/exe: no such file or directory; error reading process name for pid 90: readlink /proc/90/exe: no such file
    or directory; error reading process name for pid 92: readlink /proc/92/exe: no such file or directory; error reading process name for pid 106: readlink /proc/106/exe: no such fi
    le or directory; error reading process name for pid 360: readlink /proc/360/exe: no such file or directory; error reading process name for pid 375: readlink /proc/375/exe: no suc
    h file or directory; error reading process name for pid 384: readlink /proc/384/exe: no such file or directory; error reading process name for pid 386: readlink /proc/386/exe: no
    such file or directory; error reading process name for pid 387: readlink /proc/387/exe: no such file or directory; error reading process name for pid 422: readlink /proc/422/exe
    : no such file or directory; error reading process name for pid 491: readlink /proc/491/exe: no such file or directory; error reading process name for pid 500: readlink /proc/500
    /exe: no such file or directory; error reading process name for pid 2121: readlink /proc/2121/exe: no such file or directory; error reading
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]: process name for pid 2127: readlink /proc/2127/exe: no such file or directory]"}
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]: go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.29.0/receiver/scraperhelper/scrapercontroller.go:205
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]: go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    Jul 13 17:28:55 debian9-trouble otelopscol[2134]:         /root/go/pkg/mod/go.opentelemetry.io/collector@v0.29.0/receiver/scraperhelper/scrapercontroller.go:186
    
  • 丢弃累积指标的第一个数据点时出错:

    Jul 13 17:28:03 debian9-trouble otelopscol[2134]: 2021-07-13T17:28:03.092Z        info        exporterhelper/queued_retry.go:316        Exporting failed. Will retry the request a
    fter interval.        {"kind": "exporter", "name": "googlecloud/agent", "error": "rpc error: code = InvalidArgument desc = Field timeSeries[1].points[0].interval.start_time had a
    n invalid value of \"2021-07-13T10:25:18.061-07:00\": The start time must be before the end time (2021-07-13T10:25:18.061-07:00) for the non-gauge metric 'agent.googleapis.com/ag
    ent/uptime'.", "interval": "23.491024535s"}
    Jul 13 17:28:41 debian9-trouble otelopscol[2134]: 2021-07-13T17:28:41.269Z        info        exporterhelper/queued_retry.go:316        Exporting failed. Will retry the request a
    fter interval.        {"kind": "exporter", "name": "googlecloud/agent", "error": "rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.start_time had a
    n invalid value of \"2021-07-13T10:26:18.061-07:00\": The start time must be before the end time (2021-07-13T10:26:18.061-07:00) for the non-gauge metric 'agent.googleapis.com/ag
    ent/monitoring/point_count'.", "interval": "21.556591578s"}
    

如需了解 Cloud Monitoring 代理的其他已知问题,请参阅 Cloud Monitoring 代理问题排查指南

部分指标缺失或不一致

有少量指标在 Ops Agent 2.0.0 及更高版本上的处理方式与 Ops Agent“预览版”(低于 2.0.0 版)或 Monitoring 代理不同。

下表介绍了 Ops Agent 和 Monitoring 代理提取的数据之间的差异。
指标类型,省略了
agent.googleapis.com
Ops Agent(正式版) Ops Agent(预览版) Monitoring 代理
disk/bytes_used
disk/percent_used
提取时 device 标签中包含完整路径;例如 /dev/sd15

未针对 tmpfsudev 等虚拟设备提取该指标。
提取时 device 标签的路径中不含 /dev;例如 sda15

未针对 tmpfsudev 等虚拟设备提取该指标。
提取时 device 标签的路径中不含 /dev;例如 sda15

未针对 tmpfsudev 等虚拟设备提取该指标。
processes/count_by_state 不提取。 提取。 提取。
正式版列指 Ops Agent 2.0.0 版及更高版本。预览版列是指低于 2.0.0 的 Ops Agent 版本。

已移除经 Google Cloud Console 报告的已安装代理

卸载代理后,Google Cloud Console 最多可能需要一小时才能报告此更改。