Configuring the Ops Agent

This document provides details about the Ops Agent's default and custom configurations. Read this document if any of the following applies to you:

  • You want to change the configuration of the Ops Agent to achieve the following goals:

  • You're interested in learning the technical details of the Ops Agent's configuration.

Configuration model

The Ops Agent uses a built-in default configuration; you can't directly modify this built-in configuration. Instead, you create a file of overrides that are merged with the built-in configuration when the agent restarts.

The building blocks of the configuration are as follows:

  • receivers: This element describes what is collected by the agent.
  • processors: This element describes how the agent can modify the collected information.
  • service: This element links receivers and processors together to create data flows, called pipelines. The service element contains a pipelines element, which can contain multiple pipelines.

The built-in configuration is made up of these elements, and you use the same elements to override that built-in configuration.

Built-in configuration

The built-in configuration for the Ops Agent defines the default collection for logs and metrics. The following shows the built-in configuration for Linux and for Windows:

Linux

By default, the Ops Agent collects file-based syslog logs and host metrics.

For more information about the metrics collected, see Metrics ingested by the receiver types.

logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/messages
      - /var/log/syslog
  service:
    pipelines:
      default_pipeline:
        receivers: [syslog]
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern: []
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]

Windows

By default, the Ops Agent collects Windows event logs from System, Application, and Security channels, as well as host metrics, IIS metrics, and SQL Server metrics.

For more information about the metrics collected, see Metrics ingested by the receiver types.

logging:
  receivers:
    windows_event_log:
      type: windows_event_log
      channels: [System, Application, Security]
  service:
    pipelines:
      default_pipeline:
        receivers: [windows_event_log]
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
    iis:
      type: iis
      collection_interval: 60s
    mssql:
      type: mssql
      collection_interval: 60s
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern: []
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics, iis, mssql]
        processors: [metrics_filter]

These configurations are discussed in more detail in Logging configuration and Metrics configuration.

User-specified configuration

To override the built-in configuration, you add new configuration elements to the user configuration file. Put your configuration for the Ops Agent in the following files:

  • For Linux: /etc/google-cloud-ops-agent/config.yaml.

  • For Windows: C:\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml.

Any user-specified configuration is merged with the built-in configuration when the agent restarts.

To override a built-in receiver, processor, or pipeline, redefine it in your config.yaml file by declaring it with the same identifier.

For example, the built-in configuration for metrics includes a hostmetrics receiver that specifies a 60-second collection interval. To change the collection interval for host metrics to 30 seconds, include a metrics receiver called hostmetrics in your config.yaml file that sets the collection_interval value to 30 seconds, as shown in the following example:

metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 30s

For other examples of changing the built-in configurations, see Logging configuration and Metrics configuration.

You can also turn off the collection of logging or metric data. These changes are described in the example logging service configurations and metrics service configurations.

Logging configurations

The logging configuration uses the configuration model described previously:

  • receivers: This element describes the data to collect from log files; this data is mapped into a <timestamp, record> model.
  • processors: This optional element describes how the agent can modify the collected information.
  • service: This element links receivers and processors together to create data flows, called pipelines. The service element contains a pipelines element, which can include multiple pipeline definitions.

Each receiver and each processor can be used in multiple pipelines.

The following sections describe each of these elements.

Logging receivers

The receivers element contains a set of receivers, each identified by a RECEIVER_ID. A receiver describes how to retrieve the logs; for example, by tailing files, by using a TCP port, or from the Windows Event Log.

Structure of logging receivers

Each receiver must have an identifier, RECEIVER_ID, and include a type element. The valid types are:

  • files
  • syslog
  • tcp (available with Ops Agent versions 2.3.0 and later)
  • windows_event_log (Windows only)

The receivers structure looks like the following:

receivers:
  RECEIVER_ID:
    type: files
    ...
  RECEIVER_ID_2:
    type: syslog
    ...

Depending on the value of the type element, there might be other configuration options, as follows:

  • files receivers:

    • include_paths: Required. A list of filesystem paths to read by tailing each file. A wild card (*) can be used in the paths; for example, /var/log/*.log.

      For a list of common application log files, see Common log files.

    • exclude_paths: Optional. A list of filesystem path patterns to exclude from the set matched by include_paths.

  • syslog receivers:

    • transport_protocol: Optional. Supported values: tcp, udp. The default is tcp.

      If the value of transport_protocol is tcp, the following additional options can be used:

      • listen_host: Optional. An IP address to listen on. The default value is 0.0.0.0.

      • listen_port: Optional. A port to listen on. The default value is 5140.

  • tcp receivers:

    • format: Required. Log format. Supported value: json.

    • listen_host: Optional. An IP address to listen on. The default value is 127.0.0.1.

    • listen_port: Optional. A port to listen on. The default value is 5170.

  • windows_event_log receivers (for Windows only):

    • channels: Required. A list of Windows Event Log channels from which to read logs.

Examples of logging receivers

Sample files receiver:

receivers:
  RECEIVER_ID:
    type: files

    include_paths: [/var/log/*.log]
    exclude_paths: [/var/log/not-this-one.log]

Sample syslog receiver:

receivers:
  RECEIVER_ID:
    type: syslog

    transport_protocol: tcp
    listen_host: 0.0.0.0
    listen_port: 5140

Sample tcp receiver:

receivers:
  RECEIVER_ID:
    type: tcp

    format: json
    listen_host: 127.0.0.1
    listen_port: 5170

Sample windows_event_log receiver (Windows only):

receivers:
  RECEIVER_ID:
    type: windows_event_log

    channels: [System,Application,Security]

Logging processors

The optional processors element contains a set of processing directives, each identified by a PROCESSOR_ID. A processor describes how to manipulate the information collected by a receiver.

Structure of logging processors

Each processor must have a unique identifier and include a type element. The valid types are:

  • parse_json
  • parse_regex

The processors structure looks like the following:

processors:
  PROCESSOR_ID:
    type: parse_json
    ...
  PROCESSOR_ID_2:
    type: parse_regex
    ...

Depending on the value of the type element, there are other configuration options, as follows:

  • Both parse_json and parse_regex processors:

    • field: Optional. The name of the field in the record to parse. If the field option isn't specified, then the processor parses the message field.

    • time_key: Optional. If the log entry provides a field with a timestamp, this option specifies the name of that field. The extracted value is used to set the timestamp field of the resulting LogEntry and is removed from the payload..

      If the time_key option is specified, you must also specify the following:

      • time_format: Required if time_key is used. This option specifies the format of the time_key field so it can be recognized and analyzed properly. For details of the format, see the strptime(3) guide.
  • parse_regex processors:

    • regex: Required. The regular expression for parsing the field. The expression must include key names for the matched subexpressions; for example, "^(?<time>[^ ]*) (?<severity>[^ ]*) (?<msg>.*)$".

      For a set of regular expressions for extracting information from log files, see Common log files.

Examples of logging processors

Sample parse_json processor:

processors:
  PROCESSOR_ID:
    type: parse_json

    field:       message
    time_key:    time
    time_format: "%Y-%m-%dT%H:%M:%S.%L%Z"

Sample parse_regex processor:

processors:
  PROCESSOR_ID:
    type: parse_regex

    field:       message
    regex:       "^(?<time>[^ ]*) (?<severity>[^ ]*) (?<msg>.*)$"
    time_key:    time
    time_format: "%Y-%m-%dT%H:%M:%S.%L%Z"

Special fields in structured payloads

You can set specific fields in the LogEntry object that the agent writes to the Logging API. For structured log records, the Ops Agent strips the fields listed in the following table from the jsonPayload structure:

Record field LogEntry field

Option 1


"timestamp": {
  "seconds": CURRENT_SECONDS,
  "nanos": CURRENT_NANOS,
}

Option 2


{
  "timestampSeconds": CURRENT_SECONDS,
  "timestampNanos": CURRENT_NANOS,
}
timestamp
receiver_id (not a record field) logName
logging.googleapis.com/httpRequest httpRequest
logging.googleapis.com/severity severity
logging.googleapis.com/labels labels
logging.googleapis.com/operation operation
logging.googleapis.com/sourceLocation sourceLocation

Any remaining structured record fields remain part of the jsonPayload structure.

Logging service

The logging service links logging receivers and processors together into pipelines.

The service section has a single element, pipelines, which can contain multiple pipeline IDs and definitions. Each pipeline definition consists of the following elements:

  • receivers: Required for new pipelines. A list of receiver IDs, as described in Logging receivers. The order of the receivers IDs in the list doesn't matter. The pipeline collects data from all of the listed receivers.

  • processors: Optional. A list of processor IDs, as described in Logging processors. The order of the processor IDs in the list does matter. Each record is run through the processors in the listed order.

A service configuration has the following structure:

service:
  pipelines:
    PIPELINE_ID:
      receivers:  [...]
      processors: [...]
    PIPELINE_ID_2:
      receivers:  [...]
      processors: [...]

Example logging service configurations

To turn off the built-in logging ingestion, redefine the default pipeline with an empty receivers list and no processors. The entire logging configuration looks like the following:

logging:
  service:
    pipelines:
      default_pipeline:
        receivers: []

The following service configuration defines a pipeline with the ID custom_pipeline:

service:
  pipelines:
    custom_pipeline:
      receivers:
      - RECEIVER_ID
      processors:
      - PROCESSOR_ID

Common log files

The following table lists common log files for frequently used applications:

Application Common log files
apache /var/log/apache*/access.log
/var/log/httpd/access_log
/var/log/apache*/error.log
/var/log/httpd/error_log
cassandra /var/log/cassandra/cassandra.log
/var/log/cassandra/output.log
/var/log/cassandra/system.log
chef /var/log/chef-server/bookshelf/current
/var/log/chef-server/chef-expander/current
/var/log/chef-server/chef-pedant/http-traffic.log
/var/log/chef-server/chef-server-webui/current
/var/log/chef-server/chef-solr/current
/var/log/chef-server/erchef/current
/var/log/chef-server/erchef/erchef.log.1
/var/log/chef-server/nginx/access.log
/var/log/chef-server/nginx/error.log
/var/log/chef-server/nginx/rewrite-port-80.log
/var/log/chef-server/postgresql/current
gitlab /home/git/gitlab/log/application.log
/home/git/gitlab/log/githost.log
/home/git/gitlab/log/production.log
/home/git/gitlab/log/satellites.log
/home/git/gitlab/log/sidekiq.log
/home/git/gitlab/log/unicorn.stderr.log
/home/git/gitlab/log/unicorn.stdout.log
/home/git/gitlab-shell/gitlab-shell.log
jenkins /var/log/jenkins/jenkins.log
jetty /var/log/jetty/out.log
/var/log/jetty/*.request.log
/var/log/jetty/*.stderrout.log
joomla /var/www/joomla/logs/*.log
magento /var/www/magento/var/log/exception.log
/var/www/magento/var/log/system.log
/var/www/magento/var/report/*
mediawiki /var/log/mediawiki/*.log
memcached /var/log/memcached.log
mongodb /var/log/mongodb/*.log
mysql /var/log/mysql.log
/var/log/mysql/mysql.log
/var/log/mysql/mysql-slow.log
nginx /var/log/nginx/access.log
/var/log/nginx/error.log
postgres /var/log/postgres*/*.log
/var/log/pgsql/*.log
puppet /var/log/puppet/http.log
/var/log/puppet/masterhttp.log
puppet-enterprise /var/log/pe-activemq/activemq.log
/var/log/pe-activemq/wrapper.log
/var/log/pe-console-auth/auth.log
/var/log/pe-console-auth/cas_client.log
/var/log/pe-console-auth/cas.log
/var/log/pe-httpd/access.log
/var/log/pe-httpd/error.log
/var/log/pe-httpd/other_vhosts_access.log
/var/log/pe-httpd/puppetdashboard.access.log
/var/log/pe-httpd/puppetdashboard.error.log
/var/log/pe-httpd/puppetmasteraccess.log
/var/log/pe-mcollective/mcollective_audit.log
/var/log/pe-mcollective/mcollective.log
/var/log/pe-puppet-dashboard/certificate_manager.log
/var/log/pe-puppet-dashboard/event-inspector.log
/var/log/pe-puppet-dashboard/failed_reports.log
/var/log/pe-puppet-dashboard/live-management.log
/var/log/pe-puppet-dashboard/mcollective_client.log
/var/log/pe-puppet-dashboard/production.log
/var/log/pe-puppetdb/pe-puppetdb.log
/var/log/pe-puppet/masterhttp.log
/var/log/pe-puppet/rails.log
rabbitmq /var/log/rabbitmq/*.log
/var/log/rabbitmq/*-sasl.log
/var/log/rabbitmq/startup_err
/var/log/rabbitmq/startup_log
redis /var/log/redis*.log
/var/log/redis/*.log
redmine /var/log/redmine/*.log
salt /var/log/salt/key
/var/log/salt/master
/var/log/salt/minion
/var/log/salt/syndic.loc
solr /var/log/solr/*.log
sugarcrm /var/www/*/sugarcrm.log
syslog /var/log/syslog/var/log/messages
tomcat /var/log/tomcat*/catalina.out
/var/log/tomcat*/localhost.*.log
/var/log/tomcat*/localhost_access_log.%Y-%m-%d.txt
zookeeper /var/log/zookeeper/zookeeper.log
/var/log/zookeeper/zookeeper_trace.log

The following table lists some regular expressions that are useful for parsing logs:

Application Regular expression
apache ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
apache2 ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>.*)")?$
apache_error ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
mongodb ^(?<time>[^ ]*)\s+(?<severity>\w)\s+(?<component>[^ ]+)\s+\[(?<context>[^\]]+)]\s+(?<message>.*?) *(?<ms>(\d+))?(:?ms)?$
nginx ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")

Metrics configurations

The metrics configuration uses the configuration model described previously:

  • receivers: a list of receiver definitions. A receiver describes the source of the metrics; for example, system metrics like cpu or memory. The receivers in this list can be shared among multiple pipelines.
  • processors: a list of processor definitions. A processor describes how to modify the metrics collected by a receiver.
  • service: contains a pipelines section that is a list of pipeline definitions. A pipeline connects a list of receivers and a list of processors to form the data flow.

The following sections describe each of these elements.

Metrics receivers

The receivers element contains a set of receiver definitions. A receiver describes from where to retrieve the metrics, such as like cpu and memory. A receiver can be shared among multiple pipelines.

Structure of metrics receivers

Each receiver must have an identifier, RECEIVER_ID, and include a type element. Valid types are:

  • hostmetrics
  • iis (Windows only)
  • mssql (Windows only)

A receiver can also specify the operation collection_interval option. The value is in the format of a duration, for example, 30s or 2m. The default value is 60s.

Each of these receiver types collects a set of metrics; for information about the specific metrics included, see Metrics ingested by the receiver types.

You can create only one receiver for each type. For example, you can't define two receivers of type hostmetrics`.

Changing the collection interval in the metrics receivers

Some critical workloads might require fast alerting. By reducing the the collection interval for the metrics, you can configure more sensitive alerts. For information on how alerts are evaluated, see Alerting behavior.

For example, the following receiver changes the collection interval for host metrics (the receiver ID is hostmetrics) from the default of 60 seconds to 10 seconds:

metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 10s

You can also override the collection interval for the Windows iis and mssql metrics receivers using the same technique.

Metrics ingested by the receivers

The metrics ingested by the Ops Agent have identifiers that begin with the following pattern: agent.googleapis.com/GROUP. The GROUP component identifies a set of related metrics; it has values like cpu, network, and others.

The hostmetrics receiver ingests the following metric groups. For more information, see the linked section for each group on the Ops Agent metrics page.

Group Metric
cpu CPU load at 1 minute intervals
CPU load at 5 minute intervals
CPU load at 15 minute intervals
CPU usage, with labels for CPU number and CPU state
CPU usage percent, with labels for CPU number and CPU state
disk Disk bytes read, with label for device
Disk bytes written, with label for device
Disk I/O time, with label for device
Disk weighted I/O time, with label for device
Disk pending operations, with label for device
Disk merged operations, with labels for device and direction
Disk operations, with labels for device and direction
Disk operation time, with labels for device and direction
Disk usage, with labels for device and state
Disk utilization, with labels for device and state
interface
Linux only
Total count of network errors
Total count of packets sent over the network
Total number of bytes sent over the network
memory Memory usage, with label for state (buffered, cached, free, slab, used)
Memory usage percent, with label for state (buffered, cached, free, slab, used)
network TCP connection count, with labels for port and TCP state
swap Swap I/O operations, with label for direction
Swap bytes used, with labels for device and state
Swap percent used, with labels for device and state
pagefile
Windows only
Current percentage of pagefile used by state
processes Processes count, with label for state
Processes forked count
Per-process disk read I/O, with labels for process name + others
Per-process disk write I/O, with labels for process name + others
Per-process RSS usage, with labels for process name + others
Per-process VM usage, with labels for process name + others

The iis receiver (Windows only) ingests the metrics of the iis group. For more information, see the Agent metrics page.

Group Metric
iis
Windows only
Currently open connections to IIS
Network bytes transferred by IIS
Connections opened to IIS
Requests made to IIS

The mssql receiver (Windows only) ingests metrics of the mssql group. For more information, see the Ops Agent metrics page.

Group Metric
mssql
Windows only
Currently open connections to SQL server
SQL server total transactions per second
SQL server write transactions per second

Metrics processors

The processor element contains a set of processor definitions. A processor describes metrics from the receiver type to exclude. The only supported type is exclude_metrics, which takes a metrics_pattern option. The value is a list of globs that match the metric types you want to exclude from the group collected by a receiver; for example, agent.googleapis.com/cpu/* or agent.googleapis.com/processes/*. To find the fully qualified names of individual metrics, see the group's table on the Ops Agent metrics page.

Sample metrics processor

The following example shows the exclude_metrics processor supplied in the built-in configurations. This processor supplies an empty metrics_pattern value, so it doesn't exclude any metrics.

processors:
  metrics_filter:
    type: exclude_metrics
    metrics_pattern: []

To disable the collection of all process metrics by the Ops Agent, add the following to your config.yaml file:

metrics:
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern:
      - agent.googleapis.com/processes/*

This excludes process metrics from collection in the metrics_filter processor that applies to the default pipeline in the metrics service.

Metrics service

The metrics service links metrics receivers and processors together into pipelines.

The service section has a single element, pipelines, which can contain multiple pipeline IDs and definitions. Each pipeline definition consists of the following elements:

  • receivers: Required for new pipelines. A list of receiver IDs, as described in Metrics receivers. The order of the receivers IDs in the list doesn't matter. The pipeline collects data from all of the listed receivers.

  • processors: Optional. A list of processor IDs, as described in Metrics processors. The order of the processor IDs in the list does matter. Each metric point is run through the processors in the listed order.

A service configuration has the following structure:

service:
  pipelines:
    PIPELINE_ID:
      receivers:  [...]
      processors: [...]
    PIPELINE_ID_2:
      receivers:  [...]
      processors: [...]

Example metrics service configurations

To turn off the built-in ingestion of host metrics, redefine the default pipeline with an empty receivers list and no processors. The entire metrics configuration looks like the following:

metrics:
  service:
    pipelines:
      default_pipeline:
        receivers: []

The following example shows the built-in service configuration for Windows:

metrics:
  service:
    pipelines:
      default_pipeline:
        receivers:
        - hostmetrics
        - iis
        - mssql
        processors:
        - metrics_filter