Constructs in the API

The Service Monitoring API is used to set up service-level objectives that can be used to monitor the health of your services.

Service Monitoring adds the following resources to the Monitoring API:

This page introduces the structures used to represent services and SLOs in the Service Monitoring API and maps them to the concepts described in generally in Concepts in service monitoring.

For information on invoking the API, see Working with the API.

Services

A service is represented by a Service object. This object includes the following fields:

  • A name: A fully-qualified resource name for this service
  • A display name: A label for use in UI components
  • A telemetry-configuration object
  • An indicator of the type of service:
    • An App Engine service
    • A Cloud Endpoints service
    • An Istio on Google Kubernetes Engine service
    • A custom service

The only type of service you create manually is a custom service. The other types are automatically detected based on your environment.

For example, the following shows the JSON representation of an Istio service:

{
  "name": "projects/[PROJECT_NUMBER]/services/[PROJECT_ID]-zone-us-central1-c-csm-main-default-currencyservice",
  "displayName": "[PROJECT_ID]/us-central1-c/csm-main/default/currencyservice",
  "clusterIstio": {
    "location": "us-central1-c",
    "clusterName": "csm-main",
    "serviceNamespace": "default",
    "serviceName": "currencyservice"
  },
  "telemetry": {
    "resourceName": "//container.googleapis.com/projects/[PROJECT_ID]/zones/us-central1-c/clusters/csm-main/k8s/namespaces/default/services/currencyservice"
  }
}

This example shows the JSON representation of a custom service:

{
  "name": "projects/[PROJECT_NUMBER]/services/-I6P_NufSzKiuvX1AYHE6Q",
  "displayName": "My Test Service",
  "custom": {},
  "telemetry": {}
}

Telemetry is under development. The only meaningful value is the full name of the resource that defines the service. The format is described in Resource names.

Service-level indicators

A service-level indicator (SLI) provides a measure of the performance of a service. An SLI is based on metric captured by the service. Exactly how the SLI is defined depends on the type of metric used as the indicator metric, but it is generally some comparison between acceptable results and total results.

A SLI is represented by the ServiceLevelIndicator object. This object is a collective way to refer the three supported types of SLIs:

  • A basic SLI, which is created automatically for a well-known service type. This type of SLI is described in Service-level-objectives; it is represented by a BasicSli object and measures availability or latency.

  • A request-based SLI, which you can use to count events that represent acceptable service. Use of this type of SLI is described in Request-based SLOs; it is represented by a RequestBasedSli object.

  • A window-based SLI, which you can use to count periods of time that meet some goodness criterion. Use of this type of SLI is described in Windows-based SLOs; it is represented by a WindowsBasedSli object.

For example, the following shows a basic availability SLI:

{
  "basicSli": {
    "availability": {},
    "location": [
      "us-central1-c"
    ]
  }
}

Structures for request-based SLIs

A request-based SLI is based on a metric that counts units of service as a ratio between a particular outcome and the total. For example, if you use a metric that counts requests, you can build the ratio between the number of requests that return success and the total number of requests.

There are two ways to build a request-based SLI:

  • As a TimeSeriesRatio, when the ratio of good service to total service is computed from two time series whose values have a metric kind of DELTA or CUMULATIVE.
  • As a DistributionCut, when the time series has value type DISTRIBUTION and whose values have a metric kind of DELTA or CUMULATIVE. The good-service value is the count of items that fall into the histogram buckets in a specified range, and the total is the count of all values in the distribution.

The following shows the JSON representation of an SLI that uses a time-series ratio:

{
  "requestBased": {
    "goodTotalRatio": {
      "totalServiceFilter": "resource.type=https_lb_rule metric.type="loadbalancing.googleapis.com/https/request_count"",
      "goodServiceFilter": "resource.type=https_lb_rule metric.type="loadbalancing.googleapis.com/https/request_count" metric.label.response_code_class=200",
    }
  }
}

The time series in this ratio are identified by the pair of monitored-resource type and metric type:

  • Resource: https_lb_rule
  • Metric type: loadbalancing.googleapis.com/https/request_count

The value for the totalServiceFilter is represented by just this pair. The value for the goodServiceFilter is represented by this pair where a label has a particular value; in this case, the response_code_class value is 200.

This ratio measures the number of requests that return a 2xx HTTP status over the total number of requests.

The following shows the JSON representation of an SLI that uses a distribution cut:

{
  "requestBased": {
    "distribution_cut": {
      "distribution_filter": "resource.type=https_lb_rule  metric.type="loadbalancing.googleapis.com/https/backend_latencies" metric.label.response_code_class=200",
      "range": {
        "min": "-Infinity",
        "max": 500.0
      }
    }
  }
}

The time series is identified by the monitored-resource type, metric type, and value for a metric label:

  • Resource: https_lb_rule
  • Metric type: loadbalancing.googleapis.com/https/backend_latencies
  • Label-value pair: response_code_class = 200

The range of latencies considered good is designated by the range field.

This SLI computes the ratio of latencies of 2xx-class responses below 500 to the latencies of all 200-class responses.

Structures for windows-based SLIs

A windows-based SLI counts time windows in which the provided service is considered good. The criterion for determining how good service is part of the SLI definition.

All windows-based SLIs include a window period, between 60 seconds and 86,400 seconds (1 day).

There are two ways to specify the good-service criterion for a windows-based SLI:

  • Create a filter string, described in [Monitoring filters][monfilters] that returns a time series with boolean values. A window is good if the value for that window is true. This is called the goodBadMetricFilter.
  • Create a [PerformanceThreshold][sli-perthreshold-apiref] object that represents a threshold for acceptable performance. This object is specified as the value of the goodTotalRatioThreshold.

    A PerformanceThreshold object specifies a threshold value and a performance SLI. If the value of the performance SLI meets or exceeds the threshold value, than the time window counts as good.

    There are two ways to specify the performance SLI:

The following shows the JSON representation a windows-based SLI built on a performance threshold for a basic availability SLI:

{
  "windowsBased": {
     "goodTotalRatioThreshold": {
       "threshold": 0.9,
       "basicSliPerformance": {
         "availability": {},
         "location": [
           "us-central1-c"
         ]
       }
     },
     "windowPeriod": "300s"
   }
}

This SLI specifies good performance as a 5-minute window in which availability reaches 90% or better. The structure of a basic SLI is shown in Service-level indicators.

You can also embed a request-based SLI in the windows-based SLI. For more information on the embedded structures, see Structures for request-based SLIs.

Service-level objectives

A service-level objective (SLO) is represented by a ServiceLevelObjective object. This object includes the following fields:

  • A name
  • A display name
  • The target SLI; an embedded ServiceLevelIndicator object
  • The performance goal for the SLI
  • The compliance period for the SLI

The following shows the JSON representation of an SLO that uses a basic availability SLI as the value of the serviceLevelIndicator field:

{
   "name": "projects/[PROJECT_NUMBER]/services/[PROJECT_ID]-zone-us-central1-c-csm-main-default-currencyservice/serviceLevelObjectives/3kavNVTtTMuzL7KcXAxqCQ",
   "serviceLevelIndicator": {
     "basicSli": {
       "availability": {},
       "location": [
         "us-central1-c"
       ]
     }
   },
   "goal": 0.98,
   "calendarPeriod": "WEEK",
   "displayName": "98% Availability in Calendar Week"
}

This SLO sets the performance goal at 98 percent availability over a period of a week.