Migrating to Cloud Operations for GKE

There are two options for monitoring and logging support in Google Kubernetes Engine (GKE):

This page explains the differences between these two options and what you must change to migrate from Legacy Logging and Monitoring to Cloud Operations for GKE.

When do I need to migrate?

You can migrate your existing Cloud Monitoring and Cloud Logging configurations from Legacy Logging and Monitoring to Cloud Operations for GKE at any time. However, keep in mind that Legacy Logging and Monitoring is not supported in GKE version 1.20.

The following table summarizes which monitoring and logging options are available in each GKE release version:

GKE version Legacy Logging and Monitoring Cloud Operations for GKE
1.14 Available Default
1.15 Available Default
1.16 Available Default
1.17 Available Default
1.18 Available Default
1.19 Available Default
1.20 Not Available Default

For information on the deprecation of Legacy Logging and Monitoring, refer to the Legacy support for GKE deprecation guide.

What are the benefits of using Cloud Operations for GKE?

Cloud Operations for GKE provides important benefits to you including the following:

What is changing?

Cloud Operations for GKE uses a different resource model than Legacy Logging and Monitoring to organize its metrics, logs, and metadata. Here are some specific changes for your clusters using Cloud Operations for GKE:

  • Navigation change: Cloud Monitoring dashboard is named GKE. This dashboard doesn't appear if you don't have clusters using Cloud Operations for GKE.

  • Monitored resource type names changes: For example, your Kubernetes nodes are listed under the monitored resource type k8s_node , which is a Kubernetes Node, rather than gce_instance (Compute Engine VM instance).

  • Kubernetes metric names changes: In Cloud Operations for GKE, metric type names now start with the prefix kubernetes.io/  rather than container.googleapis.com/.

  • logEntry metadata changes: Cloud Operations for GKE log entries changed the names for some of the resource.label and labels fields. For example, the field resource.labels.namespace_id has changed to resource.labels.namespace_name while the value has not changed.

  • logName changes: Cloud Operations for GKE log entries use stdout or stderr in their log names whereas Legacy Logging and Monitoring uses a wider variety of names, including the container name. The container name is still available in Cloud Operations for GKE as a resource label under resource.labels.container_name.

The following table summarizes the preceding changes:

Change (Old) Legacy Logging and Monitoring (New) Cloud Operations for GKE
Dashboard menus Dashboards > GKE Clusters Dashboards > GKE
Metric prefixes container.googleapis.com kubernetes.io
Metrics resource types gke_container
gce_instance
(none)
k8s_container
k8s_node
k8s_pod
Log resource types container
gke_cluster
gce_instance
gke_nodepool
k8s_container
k8s_cluster
gke_cluster (audit logs only)
k8s_node
k8s_pod

Resource type changes

Cloud Operations for GKE has new resource type names, new resource type display names, and new names for the labels that identify specific resources. These changes are listed in the following table.

Resource type changes
(Old) Legacy Logging and Monitoring resource type (New) Cloud Operations for GKE resource type
Table footnotes:
1 In the new resource type used for monitoring (only), instance_id becomes node_name in metadata.system_labels.
2 zone refers to the location of this container or instance. location refers to the location of the cluster master node.
3 metadata.system_labels.node_name is not available in k8s_container resource types used for logging. You cannot search by node name for logs.
4 The gce_instance resource type can represent Kubernetes nodes as well as non-Kubernetes VM instances. When upgrading to Cloud Operations for GKE, node-related uses are changed to use the new resource type, k8s_node, including node-level logs with the following names: kubelet, docker, kube-proxy, startupscript, and node-problem-detector.
5 The k8s_pod and k8s_cluster nodes might include logs not present in the Legacy Logging and Monitoring support.
Monitoring only:
gke_container (GKE Container)

Labels:
  cluster_name
  container_name
  instance_id1
  namespace_id
  pod_id
  project_id
  zone2

Logging only:
container (GKE Container)

Labels:
  cluster_name
  container_name
  instance_id1
  namespace_id
  pod_id
  project_id
  zone2

Monitoring and Logging:
k8s_container (Kubernetes Container)

Labels:
  cluster_name
  container_name
  metadata.system_labels.node_name3
  namespace_name
  pod_name
  project_id
  location2

Logging only::
gce_instance (Compute Engine VM Instance)4

Labels:
  cluster_name
  instance_id
  project_id
  zone2
Monitoring and Logging
k8s_node4 (Kubernetes Node)

Labels:
  cluster_name
  node_name
  project_id
  location2
 
(none)
Monitoring and Logging:
k8s_pod5 (Kubernetes Pod)

Labels:
  cluster_name
  namespace_name
  pod_name
  project_id
  location2

Logging only
gke_cluster (GKE_cluster)

Labels:
  cluster_name
  project_id
  location

Monitoring and Logging:
k8s_cluster5 (Kubernetes Cluster)

Labels:
  cluster_name
  project_id
  location

What do I need to do?

This section contains more specific information on the resource model changes in Cloud Operations for GKE and their impact on your existing monitoring and logging configurations.

You should perform the following steps to migrate your cluster to Cloud Operations for GKE:

  1. Identify your Logging and Monitoring configurations: Identify any Logging and Monitoring configurations that might be using values that have changed between Legacy Logging and Monitoring and Cloud Operations for GKE.

  2. Update your Logging and Monitoring configurations: Update any Logging and Monitoring configurations to reflect the changes present in Cloud Operations for GKE.

  3. Update your GKE cluster configuration: Update your GKE cluster to use the Cloud Operations for GKE setting.

Since the resource models and logNames have changed between Legacy Logging and Monitoring and Cloud Operations for GKE, any Logging or Monitoring configurations that reference the changes in the resource models must also be updated. The migration might require you to update Logging and Monitoring configurations including, but not limited to:

  • custom dashboards
  • charts
  • group filters
  • alerting policies
  • log sinks
  • log exclusions
  • log-based metrics in Cloud Logging and Cloud Monitoring

Identifying clusters using Legacy Logging and Monitoring

Use Cloud Monitoring's GKE Clusters dashboard to identify which clusters within a project are still using Legacy Logging and Monitoring:

  1. Click on the Cloud Monitoring GKE Clusters dashboard.
  2. Ensure the "Metrics Scope" selected includes the Google Cloud project that you want to review for clusters running Legacy Logging and Monitoring.
  3. View the list of clusters in the dashboard. Only clusters using the Legacy Logging and Monitoring appear in the dashboard.

    For example, in the following screenshot, there are 4 clusters using Legacy Logging and Monitoring.

    Display of clusters using the legacy solutions.

Migrating your monitoring resources

If you are using Legacy Logging and Monitoring with a GKE cluster whose control plane version is 1.15 or newer, then your cluster's metrics are available in both the Legacy Monitoring and Cloud Operations for GKE resource models. This means that even before you migrate your clusters to Cloud Operations for GKE, your clusters start generating metrics using the new data model at no additional cost.

Starting in January 2021, your custom dashboards and alerts will be updated automatically to reference the new resource model metrics. If you want to migrate your own Cloud Monitoring configurations (charts in custom dashboards, alerts, groups) you need to update each configuration to reflect the new resource model.

You also need to migrate your configurations if you maintain your configuration in Terraform or another deployment manager and automatically sync changes.

Identifying configurations for the old data model

To identify the Cloud Monitoring configurations that you must update as part of the migration to Cloud Operations for GKE, view the Kubernetes Migration Status dashboard:

  1. In the Google Cloud console, go to Monitoring:

    Go to Monitoring

  2. In the Monitoring navigation pane, click Settings and then select the tab Kubernetes Migration Status.

The following sample dashboard shows that 1 alerting policy needs to be updated:

Display of the migration dashboard.

Updating Cloud Monitoring configurations

If your cluster is using GKE version 1.15 or later and is using Legacy Monitoring, then it is publishing to both data models. In this case, you have two options for how to migrate your configurations.

  • Clone the configurations and update the clones. With this option, you create a copy of your existing dashboards, alerting policies, and groups and migrate the copies to the new resource model. That way, you can continue to use Monitoring for your cluster using the old data model and the new data model simultaneously. For example, with this option, you would have 2 dashboards: the original one that continues to use the original resource model and a clone of original dashboard that uses the new resource model.

  • Upgrade the affected configurations in place. This option switches to the new data model in Cloud Monitoring immediately.

The following sections provide instructions for migrating your configurations for dashboards, alerting policies, and groups.

One consideration for deciding which option to choose is how much monitoring history you want to have available. Currently Cloud Monitoring offers 6 weeks of historical data for your clusters. After the GKE cluster upgrade that starts double writing to the data models, the old data model still has the historical metrics for the cluster, while the new data model only has metrics that begin at the time of the upgrade.

If you don't need the historical data, you can upgrade the configurations in place to the new data model at any time. If the historical data is important, you can clone the configurations and update the clones to use the new resource model types.

Alternatively, you can wait for 6 weeks after your cluster starts double writing to both of the data models. After six weeks, both data models have the same historical data, so you can upgrade the configurations in place and switch to the new data model.

Updating dashboards

To view your dashboards, complete the following steps:

  1. From the Google Cloud console, go to Monitoring:

    Go to Monitoring

  2. Select Dashboards.

To clone a dashboard and update the clone, complete the following steps:

  1. Find the dashboard you want to clone.

  2. Click Copy Dashboard () and enter a name for the cloned dashboard.

  3. Update the new dashboard's configurations as needed.

To update the chart definitions in the dashboard, complete the following steps:

  1. Click More chart options (⋮) of the chart you want to edit.

  2. Select Edit to open the Edit chart panel.

  3. Change the resource type and metric name to translate to the new data model. You can also update the Filter and Group by fields as necessary.

Updating alerting policies

To view your alerting policies, complete the following steps:

  1. From the Google Cloud console, go to Monitoring:

    Go to Monitoring

  2. Select Alerting.

To clone and update an alerting policy, complete the following steps:

  1. Select the policy you want to clone from the Policies table.

  2. Click Copy to begin the creation flow for the copy of the alerting policy.

  3. Edit any conditions that refer to the old data model to update the resource type and metric name.

    The last step of the flow lets you to enter a name for the cloned policy.

To edit an alerting policy in place, complete the following steps:

  1. Select the policy you want to edit from the Policies table.

  2. Click Edit to update the policy.

  3. Update any conditions that refer to the old data model.

Updating groups

You can't clone a group through the Google Cloud console, so if you want to duplicate a group, you must create a new group with the same filter.

A group filter can reference the old data model in several ways.

  • Resource type - A group might define a filter resource.type="gke_container". Because the gke_container type can be used to refer to several different types of GKE entities, you must update the filter to the type of resource that you actually intend to match: k8s_container, k8s_pod, or k8s_node. If you want to match multiple types, then define a filter with multiple clauses combined with the OR operator.

  • Label cloud_account - A group might define a filter resource.metadata.cloud_account="<var>CLOUD_ACCOUNT_ID</<var>". As part of a separate deprecation the cloud_account metadata field is no longer available. Consider using the resource.labels.project_id label.

  • Label region - A group might define a filter resource.metadata.region="<var>REGION_NAME</<var>". The region metadata field is no longer available in the new data model. If you want to match GKE entities based on geographic location, consider using the resource.labels.location label.

Mapping metrics between data models

This section describes how to map metrics from the old data model to the metrics in the new data model. The old data model published 17 different metrics, listed in the tables below. Some of these metrics were published against multiple GKE entity types, which results in more than 17 mappings to translate all metrics.

When mapping metrics, remember the following:

  • The prefix for the old metrics is container.googleapis.com/. The prefix for the new metrics is kubernetes.io/.

  • In the old data model, the only resource type is gke_container. Depending on how you defined the resource labels, this resource type might refer to GKE Containers, Pods, System Daemons, and Machines, which correspond to GKE Nodes.

  • You can query the Monitoring API using combinations of pod_id and container_name that don't match those listed in the following table. The data returned by such queries is undefined, and no mapping from these undefined states are provided.

    GKE Entity Type Filter
    Container pod_id != '' and container_name != ''
    (pod_id is not the empty string and container_name is not the empty string)
    Pod pod_id != '' and container_name == ''
    (pod_id is not the empty string and container_name is the empty string)
    System daemon pod_id == '' and container_name != 'machine'
    (pod_id is the empty string and container_name is one of docker-daemon, kubelets, or pods)
    Machine pod_id == '' and container_name == 'machine'
    (pod_id is the empty string and container_name is the string machine)

The tables list three types of mappings:

  • Direct mapping between the old and new data models.

  • Mappings that require configuration.

  • Mappings of old metrics that don't have a direct equivalent in the new model.

Direct mapping

The following metrics translate directly between the old and new data models.

Old Metric Name Old GKE Entity Type New Metric Name New GKE Resource Type Notes
container/accelerator/
duty_cycle
Container container/accelerator/
duty_cycle
k8s_container
container/accelerator/
memory_total
Container container/accelerator/
memory_total
k8s_container
container/accelerator/
memory_used
Container container/accelerator/
memory_used
k8s_container
container/accelerator/
request
Container container/accelerator/
request
k8s_container
container/cpu/
reserved_cores
Container container/cpu/
limit_cores
k8s_container See Mappings that require configuration for mapping when resource is pod
container/cpu/
usage_time
Container container/cpu/
core_usage_time
k8s_container See Mappings that require configuration for mapping when resource is pod
container/cpu/
usage_time
System Daemon node_daemon/cpu/
core_usage_time
k8s_node In the old data model,
gke_container.container_name is one of docker-daemon, kubelets, or pods. These filter values match with the values in the new data model field metric.component.
container/cpu/
utilization
Container container/cpu/
limit_utilization
k8s_container
container/disk/
bytes_total
Pod pod/volume/
total_bytes
k8s_pod gke_container.device_name (Volume:config-volume) is translated to k8s_pod.volume_name (config-volume) by removing the Volume: that was prepended.
container/disk/bytes_used Pod pod/volume/
used_bytes
k8s_pod gke_container.device_name (Volume:config-volume) is translated to k8s_pod.volume_name (config-volume) by removing the Volume: that was prepended.
container/memory/
bytes_total
Container container/memory/
limit_bytes
k8s_container
container/memory/
bytes_used
Container container/memory/
used_bytes
k8s_container
container/memory/
bytes_used
System Daemon node_daemon/memory/
used_bytes
k8s_node In the old data model,
gke_container.container_name is one of docker-daemon, kubelets, or pods. These filter values match with the values in the new data model field metric.component.
container/disk/
inodes_free
Machine node/ephemeral_storage/
inodes_free
k8s_node The old data model has the instance_id field, a random number ID. The new data model has node_name, a human-readable name.
container/disk/
inodes_total
Machine node/ephemeral_storage/
inodes_total
k8s_node The old data model has the instance_id field, a random number ID. The new data model has node_name, a human-readable name.
container/pid_limit Machine node/pid_limit k8s_node The old data model has the instance_id field, a random number ID. The new data model has node_name, a human-readable name.
container/pid_used Machine node/pid_used k8s_node The old data model has the instance_id field, a random number ID. The new data model has node_name, a human-readable name.
Mappings that require configuration

The following metrics translate from the old data model to the new data model with some basic manipulation.

Old Metric Name Old GKE Entity Type New Metric Name New GKE Resource Type Notes
container/cpu/
reserved_cores
Pod SUM container/cpu/limit_cores
GROUP BY pod_name
k8s_container The old data model has a pod_id field, a UUID. The new data model has pod_name, a human-readable name.
container/cpu/
usage_time
Pod SUM container/cpu/core_usage_time
GROUP BY pod_name
k8s_container The old data model has a pod_id field, a UUID. The new data model has a pod_name, a human-readable name.
container/disk/
bytes_total
Container node/ephemeral_storage/
total_bytes
k8s_container gke_container.device_name is one of / or logs. Each of these values is equal to the new value.
container/disk/
bytes_used
Container container/ephemeral_storage/
used_bytes
k8s_container gke_container.device_name is one of / or logs. These 2 values must be added together to get the new value. In the new data model, you cannot get the value for / and logs separately.
container/memory/
bytes_total
Pod SUM container/memory/limit_bytes
GROUP BY pod_name
k8s_container The old data model has a pod_id field, a UUID. The new data model has pod_name, a human-readable name.
container/memory/
bytes_used
Pod SUM container/memory/used_bytes
GROUP BY pod_name
k8s_container The old data model has a pod_id field, a UUID. The new data model has pod_name, a human-readable name.
Mappings that don't have a direct equivalent in the new model

The following metrics don't have an equivalent in the new data model.

CPU utilization for Pod
In the old data model, this metric, based on the CPU limit for each container, is a weighted average of CPU utilization across all containers in a pod.
In the new data model, this value doesn't exist and must be calculated on the client-side based on the limit and utilization of each container.
Uptime
In the old data model, this metric is a cumulative metric that represents the fraction of time that a container is available in units ms/s. For a container that is always available, the value is ~1000ms/s.
In the new data model, this metric is a gauge metric in hours that reports how long each part of the system has been running without interruption.

Resource group changes

If you define your own resource groups and use any of the Legacy Logging and Monitoring resource types shown in the preceding Resource type changes table, then change those types to be the corresponding Cloud Operations for GKE resource types. If your resource group includes custom charts, you might have to change them.

Migrating your logging resources

To migrate your logging resources, complete the steps in the following sections.

Changes in log entry contents

When you update to Cloud Operations for GKE, you might find that certain information in log entries has moved to differently-named fields. This information can appear in logs queries used in logs-based metrics, log sinks, and log exclusions.

The following table, Log entry changes, lists the new fields and labels. Here's a brief summary:

  • Check the logName field in your filters. Cloud Operations for GKE log entries use stdout or stderr in their log names whereas Legacy Logging and Monitoring used a wider variety of names, including the container name. The container name is still available as a resource label.
  • Check the labels field in the log entries. This field might contain information formerly in the metadata log entry fields.
  • Check the resource.labels field in the log entries. The new resource types have additional label values.
Log entry changes
(Old) Legacy Logging and Monitoring log entries (New) Cloud Operations for GKE log entries
Table footnotes:
1 Resource labels identify specific resources that yield metrics, such as specific clusters and nodes.
2 The labels field appears in new log entries that are part of Cloud Operations for GKE and occasionally in some Legacy Logging and Monitoring log entries. In Cloud Operations for GKE, it is used to hold some information formerly in the metadata log entry fields.
Log entry resources
resource.labels (Resource labels1)
Log entry resources
resource.labels (Resource labels1)
Log entry metadata
labels (Log entry labels2)

labels (Examples)
  compute.googleapis.com/resource_name:
    "fluentd-gcp-v3.2.0-d4d9p"

  container.googleapis.com/namespace_name:
    "kube-system"

  container.googleapis.com/pod_name:
    "fluentd-gcp-scaler-8b674f786-d4pq2"

  container.googleapis.com/stream:
    "stdout"
Log entry metadata
labels

labels (Examples)
  k8s-pod/app:
    "currencyservice"

  k8s-pod/pod-template-hash:
    "5a67f17c"

Example logs:

Container resource type changes:

The red, bold text highlights the differences between the Legacy Logging and Monitoring and Cloud Operations for GKE resource models.

Resource model Example logs
Legacy Logging and Monitoring
{
  "insertId": "fji4tsf1a8o5h",
  "jsonPayload": {
    "pid": 1,
    "name": "currencyservice-server",
    "v": 1,
    "message": "conversion request successful",
    "hostname": "currencyservice-6995d74b95-zjkmj"
  },
  "resource": {
    "type": "container",
    "labels": {
      "project_id": "my-test-project",
      "cluster_name": "my-test-cluster",
      "pod_id": "currencyservice-6995d74b95-zjkmj",
      "zone": "us-central1-c",
      "container_name": "server",
      "namespace_id": "default",
      "instance_id": "1234567890"
    }
  },
  "timestamp": "2020-10-02T19:02:47.575434759Z",
  "severity": "INFO",
  "labels": {
    "container.googleapis.com/pod_name": "currencyservice-6995d74b95-zjkmj",
    "compute.googleapis.com/resource_name": "gke-legacy-cluster-default-pool-c534acb8-hvxk",
    "container.googleapis.com/stream": "stdout",
    "container.googleapis.com/namespace_name": "default"
  },
  "logName": "projects/my-test-project/logs/server",
  "receiveTimestamp": "2020-10-02T19:02:50.972304596Z"
}
Cloud Operations for GKE
{
  "insertId": "mye361s5zfcl55amj",
  "jsonPayload": {
    "v": 1,
    "name": "currencyservice-server",
    "pid": 1,
    "hostname": "currencyservice-5b69f47d-wg4zl",
    "message": "conversion request successful"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "container_name": "server",
      "project_id": "my-test-project",
      "pod_name": "currencyservice-5b69f47d-wg4zl",
      "namespace_name": "onlineboutique",
      "location": "us-central1-c",
      "cluster_name": "my-prod-cluster"

    }
  },
  "timestamp": "2020-10-02T18:41:55.359669767Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/app": "currencyservice",
    "k8s-pod/pod-template-hash": "5b69f47d",
    "compute.googleapis.com/resource_name": "gke-legacy-cluster-default-pool-c534acb8-hvxk"
  },
  "logName": "projects/my-test-project/logs/stdout",
  "receiveTimestamp": "2020-10-02T18:41:57.930654427Z"
}

Cluster resource type changes:

The red, bold text highlights the differences between the Legacy Logging and Monitoring and Cloud Operations for GKE resource models.

Resource model Example logs
Legacy Logging and Monitoring
{
  "insertId": "962szqg9uiyalt",
  "jsonPayload": {
    "type": "Normal",
    "involvedObject": {
      "apiVersion": "policy/v1beta1",
      "uid": "a1bc2345-12ab-12ab-1234-123456a123456",
      "resourceVersion": "50968",
      "kind": "PodDisruptionBudget",
      "namespace": "knative-serving",
      "name": "activator-pdb"
    },
    "apiVersion": "v1",
    "reason": "NoPods",
    "source": {
      "component": "controllermanager"
    },
    "message": "No matching pods found",
    "kind": "Event",
    "metadata": {
      "selfLink": "/api/v1/namespaces/knative-serving/events/activator-pdb.163a42fcb707c1fe",
      "namespace": "knative-serving",
      "name": "activator-pdb.163a42fcb707c1fe",
      "uid": "a1bc2345-12ab-12ab-1234-123456a123456",
      "creationTimestamp": "2020-10-02T19:17:50Z",
      "resourceVersion": "1917"
    }
  },
  "resource": {
    "type": "gke_cluster",
    "labels": {
      "project_id": "my-test-project",
      "location": "us-central1-c",
      "cluster_name": "my-prod-cluster"
    }
  },
  "timestamp": "2020-10-02T21:33:20Z",
  "severity": "INFO",
  "logName": "projects/my-test-project/logs/events",
  "receiveTimestamp": "2020-10-02T21:33:25.510671123Z"
}
Cloud Operations for GKE
{
  "insertId": "1qzipokg6ydoesp",
  "jsonPayload": {
    "involvedObject": {
      "uid": "a1bc2345-12ab-12ab-1234-123456a123456",
      "name": "istio-telemetry",
      "apiVersion": "autoscaling/v2beta2",
      "resourceVersion": "90505937",
      "kind": "HorizontalPodAutoscaler",
      "namespace": "istio-system"
    },
    "source": {
      "component": "horizontal-pod-autoscaler"
    },
    "kind": "Event",
    "type": "Warning",
    "message": "missing request for cpu",
    "metadata": {
      "resourceVersion": "3071416",
      "creationTimestamp": "2020-08-22T14:18:59Z",
      "name": "istio-telemetry.162d9ce2894d6642",
      "selfLink": "/api/v1/namespaces/istio-system/events/istio-telemetry.162d9ce2894d6642",
      "namespace": "istio-system",
      "uid": "a1bc2345-12ab-12ab-1234-123456a123456"
    },
    "apiVersion": "v1",
    "reason": "FailedGetResourceMetric"
  },
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "project_id": "my-test-project"
      "location": "us-central1-a",
      "cluster_name": "my-prod-cluster1",
    }
  },
  "timestamp": "2020-10-02T21:39:07Z",
  "severity": "WARNING",
  "logName": "projects/my-test-project/logs/events",
  "receiveTimestamp": "2020-10-02T21:39:12.182820672Z"
}
   

Node resource type changes:

The red, bold text highlights the differences between the Legacy Logging and Monitoring and Cloud Operations for GKE resource models.

Resource model Example logs
Legacy Logging and Monitoring
{
  "insertId": "16qdegyg9t3n2u5",
  "jsonPayload": {
    "SYSLOG_IDENTIFIER": "kubelet",
    [...]
    "PRIORITY": "6",
    "_COMM": "kubelet",
    "_GID": "0",
    "_MACHINE_ID": "9565f7c82afd94ca22612c765ceb1042",
    "_SYSTEMD_UNIT": "kubelet.service",
    "_EXE": "/home/kubernetes/bin/kubelet"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "1234567890",
      "zone": "us-central1-a",
      "project_id": "my-test-project"
    }
  },
  "timestamp": "2020-10-02T21:43:14.390150Z",
  "labels": {
    "compute.googleapis.com/resource_name": "gke-legacy-monitoring-default-pool-b58ff790-29rr"
  },
  "logName": "projects/my-test-project/logs/kubelet",
  "receiveTimestamp": "2020-10-02T21:43:20.433270911Z"
}
   
Cloud Operations for GKE
{
  "insertId": "kkbgd6e5tmkpmvjji",
  "jsonPayload": {
    "SYSLOG_IDENTIFIER": "kubelet",
   [...]
    "_CAP_EFFECTIVE": "3fffffffff",
    "_HOSTNAME": "gke-standard-cluster-1-default-pool-f3929440-f4dy",
    "PRIORITY": "6",
    "_COMM": "kubelet",
    "_TRANSPORT": "stdout",
    "_GID": "0",
    "MESSAGE": "E1002 21:43:14.870346    1294 pod_workers.go:190] Error syncing pod 99ba1919-d633-11ea-a5ea-42010a800113 (\"stackdriver-metadata-agent-cluster-level-65655bdbbf-v5vjv_kube-system(99ba1919-d633-11ea-a5ea-42010a800113)\"), skipping: failed to \"StartContainer\" for \"metadata-agent\" with CrashLoopBackOff: \"Back-off 5m0s restarting failed container=metadata-agent pod=stackdriver-metadata-agent-cluster-level-65655bdbbf-v5vjv_kube-system(99ba1919-d633-11ea-a5ea-42010a800113)\""
  },
  "resource": {
    "type": "k8s_node",
    "labels": {
      "cluster_name": "my-prod-cluster-1",
      "location": "us-central1-a",
      "node_name": "gke-standard-cluster-1-default-pool-f3929440-f4dy"
       "project_id": "my-test-project",
    }
  },
  "timestamp": "2020-10-02T21:43:14.870426Z",
  "logName": "projects/my-test-project/logs/kubelet",
  "receiveTimestamp": "2020-10-02T21:43:20.788933199Z"
}

Logging configuration updates

This section describes changes you might need to make to your Cloud Logging configuration as part of a migration to Cloud Operations for GKE. You also need to migrate your configurations if you maintain your configuration in Terraform or another deployment manager and automatically sync changes.

Logging queries

If you use queries to find and filter your logs in Cloud Logging, and you use any of the Legacy Logging and Monitoring resource types shown in the preceding Resource type changes table, then change those types to the corresponding Cloud Operations for GKE types.

For example, in Legacy Logging and Monitoring, you query for container logs using the container resource type while in Cloud Operations for GKE you use the resource type, k8s_container to query container logs.

  resource.type="k8s_container"

As another example, in Legacy Logging and Monitoring, you query for specific log names for containers using the name of the container, while in Cloud Operations for GKE you use the stdout and stderr log names to query container logs.

  resource.type="k8s_container"
  log_name="projects/YOUR_PROJECT_NAME/logs/stdout"
  resource.labels.container_name="CONTAINER_NAME"

Logs-based metrics

If you define your own logs-based metrics and use Legacy Logging and Monitoring metrics or resource types shown in the previous Metric name changes or Resource type changes tables, then change those metrics and resource types to the corresponding Cloud Operations for GKE ones.

You can use the following gcloud CLI command to find your logs-based metrics:

  gcloud logging metrics list --filter='filter~resource.type=\"container\" OR filter~resource.type=container'

  gcloud logging metrics list --filter='filter~resource.labels.namespace_id'

  gcloud logging metrics list --filter='filter~resource.labels.pod_id'

  gcloud logging metrics list --filter='filter~resource.labels.zone'

You can use the following gcloud CLI command to update your logs-based metrics.

  gcloud logging metrics update YOUR_LOGS_BASED_METRIC_NAME --log-filter='resource.type=\"container\" OR resource.type=\"k8s_container\"'

  gcloud logging metrics update YOUR_LOGS_BASED_METRIC_NAME --log-filter='resource.labels.namespace_id=\"YOUR_NAMESPACE\" OR resource.labels.namespace_name=\"YOUR_NAMESPACE\"'

  gcloud logging metrics update YOUR_LOGS_BASED_METRIC_NAME --log-filter='resource.labels.pod_id=\"YOUR_POD_NAME\" OR resource.labels.pod_name=\"YOUR_NAME\"'

  gcloud logging metrics update YOUR_LOGS_BASED_METRIC_NAME --log-filter='resource.labels.zone=\"YOUR_ZONE\" OR resource.labels.location=\"YOUR_ZONE\"'

Alternatively, you can update your logs-based metrics in the Google Cloud console.

Logs exports

If you export any of your logs, and if your export uses Legacy Logging and Monitoring resource types shown in the previous Resource type changes table, then change your export to use the corresponding Cloud Operations for GKE resource types. Cloud Operations for GKE log entries use stdout or stderr in their log names whereas Legacy Logging and Monitoring uses the container name.

There are 2 important considerations for the log name change:

  1. Changes to export destination file locations and tables – The log name values in Cloud Operations for GKE include stdout or stderr rather than the container name. The container name is still available as a resource label. Processing of the log name in Cloud Storage exports or queries against the BigQuery tables need to be changed to use the stdout and stderr log names.
  2. logName values – The log name values are used to determine the exported file structure in Cloud Storage and the table structure in BigQuery. Usage of the Cloud Storage files and BigQuery tables should be adjusted to account for the folder structure in Cloud Storage and table structures in BigQuery.

You can use the following Google Cloud CLI commands to find your affected Logging sinks:

  gcloud logging sinks list --filter='filter~resource.type=\"container\" OR filter~resource.type=container'

  gcloud logging sinks list --filter='filter~resource.labels.namespace_id'

  gcloud logging sinks list --filter='filter~resource.labels.pod_id'

  gcloud logging sinks list --filter='filter~resource.labels.zone'

You can use the following gcloud CLI command to update your Logging sinks.

  gcloud logging sinks update YOUR_SINK_NAME --log-filter='resource.type=\"container\" OR resource.type=\"k8s_container\"'

  gcloud logging sinks update YOUR_SINK_NAME --log-filter='resource.labels.namespace_id=\"YOUR_NAMESPACE\" OR resource.labels.namespace_name=\"YOUR_NAMESPACE\"'

  gcloud logging sinks update YOUR_SINK_NAME --log-filter='resource.labels.pod_id=\"YOUR_POD_NAME\" OR resource.labels.pod_name=\"YOUR_NAME\"'

  gcloud logging sinks update YOUR_SINK_NAME --log-filter='resource.labels.zone=\"YOUR_ZONE\" OR resource.labels.location=\"YOUR_ZONE\"'

Alternatively, you can update your logs-based metrics in the Google Cloud console.

Logs exclusions

If you exclude any of your logs, and if your exclusion filters use Legacy Logging and Monitoring resource types shown in the previous Resource type changes table, then change your exclusion filters to use the corresponding Cloud Operations for GKE resource types.

For information on viewing your logs exclusions, refer to the Viewing exclusion filters guide.

Changes in log locations

In Cloud Logging, your logs are stored with the resource type that generated them. Since these types have changed in Cloud Operations for GKE, be sure to look for your logs in the new resource types like Kubernetes Container, not in the Legacy Logging and Monitoring types such as GKE Container.

Update your cluster's configuration

After you have migrated any logging and monitoring resources to use the Cloud Operations for GKE data format, the last step is to update your GKE cluster to use Cloud Operations for GKE.

To update your GKE cluster's logging and monitoring configuration, follow these steps:

CONSOLE

  1. Go to the GKE Clusters page for your project. The following button takes you there:

    Go to Kubernetes clusters

  2. Click on the cluster you want to update to use Cloud Operations for GKE.

  3. In the row labelled Cloud Operations for GKE, click the Edit icon.

  4. In the dialog box that appears, confirm Enable Cloud Operations for GKE is selected.

  5. In the dropdown menu within that dialog box, select which logs and metrics you want collected. The default (recommended) setting for Cloud Operations for GKE is System and workload logging and monitoring. Selecting any value in this dropdown other than "Legacy Logging and Monitoring" will update the cluster to start using Cloud Operations for GKE rather than Legacy Logging and Monitoring.

  6. Click Save Changes

GCLOUD

  1. Run this command:

    gcloud container clusters update [CLUSTER_NAME] \
      --zone=[ZONE] \
      --project=[PROJECT_ID] \
      --logging=SYSTEM,WORKLOAD \
      --monitoring=SYSTEM
    

What's next