Troubleshooting

This page provides troubleshooting information for common scenarios when using logs-based metrics in Cloud Logging.

Metric is missing logs data

There are several possible reasons for missing data in logs-based metrics:

  • New log entries might not match your metric's logs query. A logs-based metric gets data from matching log entries that are received after the metric is created. Logging does not backfill the metric from previous log entries.

  • New log entries might not contain the correct field, or the data might not be in the correct format for extraction by your distribution metric. Check that your field names and regular expressions are correct.

  • Your metric counts might be delayed. Even though countable log entries appear in the Logs Explorer, it may take up to 10 minutes to update the logs-based metrics in Cloud Monitoring.

  • The log entries that are displayed might be counted late or might not be counted at all, because they are time-stamped too far in the past or future. If a log entry is received by Cloud Logging more than 24 hours in the past or 10 minutes in the future, then the log entry won't be counted in the logs-based metric.

    The number of late-arriving entries is recorded for each log in the system logs-based metric logging.googleapis.com/logs_based_metrics_error_count.

    Example: A log entry matching a logs-based metric arrives late. It has a timestamp of 2:30 PM on February 20, 2020 and a receivedTimestamp of 2:45 PM on February 21, 2020. This entry won't be counted in the logs-based metric.

False-positive alerts or alerts that aren't triggered

You could get false-positive alerts or alerts that aren't being triggered from logs-based metrics because the alignment period for the alert is too short. Common scenarios where an alignment period that's too short causes problems are when an alert uses less than logic, or the alert is based on a percentile condition for a distribution metric.

False-positive alerts can occur because log entries can be sent to Logging late. For example, the log fields timestamp and receiveTimestamp can have a delta of minutes in some cases. Also, when Logging ingests logs, there is an inherent delay between when the log entries are generated and when Logging receives them. This means that Logging might not have the total count for a particular log entry until some later point in time after the log entries were generated. This is why an alert using less than logic or based on a percentile condition for a distribution metric can produce a false-positive alert: not all the log entries have been accounted for yet.

However, logs-based metrics are always eventually consistent. Logs-based metrics are eventually consistent because a log entry that matches a logs-based metric can be sent to Logging with a timestamp that is significantly older or newer than the log's receiveTimestamp.

This means that the logs-based metric can receive log entries with older timestamps after existing log entries with the same timestamp have already been received by Logging. Thus, the metric value must be updated.

In order to guarantee that alerts are accurate even for on-time data, alert policies for logs-based metrics should use alert conditions with alignment periods greater than or equal to two minutes. For log entries that are sent to Logging with delays measured in minutes, an alignment period of ten minutes is recommended to balance timeliness and accuracy.

Metric has too many time series

The number of time series in a metric depends on the number of different combinations of label values. The number of time series is called the cardinality of the metric, and it must not exceed 30,000.

Because you can generate a time series for every combination of label values, if you have one or more labels with high number of values, it is not difficult to exceed 30,000 time series. You want to avoid high-cardinality metrics.

As the cardinality of a metric increases, the metric can get throttled and some data points might not be written to the metric. Charts that display the metric can be slow to load due to the large number of time series that the chart has to process. You might also incur costs for API calls to query time series data; review Cloud Monitoring costs for details.

To avoid creating high cardinality metrics:

  • Check that your label fields and extractor regular expressions match values that have a limited cardinality.

  • Avoid extracting text messages that can change, without bounds, as label values.

  • Avoid extracting numerical values with unbounded cardinality.

  • Only extract values from labels of known cardinality; for instance, status codes with a set of known values.

These two system logs-based metrics can help you measure the effect that adding or removing labels has on the cardinality of your metric:

When you inspect these metrics, you can further filter your results by metric name. For details, go to Selecting metrics: filtering.

Metric name is invalid

When you create a counter or distribution metric, choose a metric name that is unique among the logs-based metrics in your project.

Metric-name strings must not exceed 100 characters and can include only the following characters:

  • A-Z
  • a-z
  • 0-9
  • The special characters _-.,+!*',()%\/.

    The forward slash character / denotes a hierarchy of pieces within the metric name and cannot be the first character of the name.

Label values are truncated

Values for user-defined labels must not exceed 1,024 bytes.