Impact of temporary disconnection from Google Cloud

Anthos is the Google Cloud application modernization platform. It is based on Kubernetes and can be deployed on Google Cloud, on other clouds, on VMware, and on bare metal servers. Even when an Anthos cluster runs on-premises, it is designed to have a permanent connection to Google Cloud for a number of reasons, including monitoring and management. However, you might need to know what would happen if, for any reason, the connection to Google Cloud is lost (for example, because of a technical problem). This document outlines the impact of a loss of connectivity for Anthos clusters on bare metal and Anthos clusters on VMware, and which workarounds you can use in this event.

This information is useful for architects who need to prepare for an unplanned or forced disconnection from Google Cloud and understand its consequences. However, you should not plan to use Anthos disconnected from Google Cloud as a nominal working mode. Remember that we design Anthos to take advantage of the scalability and availability of Google Cloud services. This document is informed by the design and architecture of the various Anthos components during a temporary interruption. We can not guarantee that this document is exhaustive.

This document assumes that you are familiar with Anthos. If that's not the case, we recommend that you first read the Anthos technical overview.

Anthos license validation and metering

If the Anthos API (anthos.googleapis.com) is enabled in your Google Cloud project, the Anthos metering controller, running in the cluster, generates and refreshes the Anthos entitlement periodically. The tolerance for disconnection is 12 hours. Additionally, metering and billing are managed through the connection.

This table lists the behavior of features related to licensing and metering in case of temporary disconnection from Google Cloud.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Anthos license validation The Anthos metering controller generates and refreshes the Anthos entitlement custom resource periodically, as long as anthos.googleapis.com is enabled in the Cloud project. The components that consume the entitlement custom resource support a grace period: they continue to function as long as the entitlement custom resource is refreshed within the grace period. Currently unlimited. After the grace period expires, components start to log errors. You cannot upgrade your cluster anymore. None
Metering and billing The Anthos metering controller reports the vCPU capacity of the cluster to the Google Cloud Service Control API for billing purposes. There is an in-cluster agent that persists billing records in the cluster when disconnected, and the records are retrieved once the cluster re-connects to Google Cloud. Unlimited. However, Anthos metering information is required for compliance as stated in the Service Specific Terms for "Premium Software". None

Cluster lifecycle

This section covers scenarios such as creating, updating, deleting, and resizing clusters, as well as monitoring the status of these activities.

For most scenarios, you can use CLI tools such as bmctl, gkectl, and kubectl to perform operations during a temporary disconnection. You can also monitor the status of these operations with these tools. Upon reconnection, the Google Cloud console updates to display the results of operations performed during the disconnected period.

Action Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Cluster creation You use the bmctl or gkectl CLI tools to create clusters. This operation requires a connection to Google Cloud. You cannot create clusters. Zero None
Cluster upgrade You use the bmctl or gkectl CLI tools to upgrade clusters. This operation requires a connection to Google Cloud. You cannot upgrade clusters. Zero None
Cluster deletion You use the bmctl or gkectl CLI tools to delete clusters. This operation does not require a connection to Google Cloud. You can delete clusters. Unlimited -
Viewing cluster status You can see information about your clusters in the console, in the list of Google Kubernetes Engine clusters. Cluster information is not shown in the console. Unlimited Use kubectl to directly query your clusters and get the information you need.
Removing nodes from a cluster You do not need a connection to Google Cloud to remove nodes from a cluster. You can remove nodes from a cluster. Unlimited -
Adding nodes to a cluster The new node pulls container images from Container Registry to properly work. A preflight check runs to validate that there is connectivity to Google Cloud. The preflight checks that run when adding a new node validate that there is connectivity to Google Cloud. Therefore, you cannot add a new node to a cluster when disconnected. Zero None

Application lifecycle

Managing your applications running in an Anthos cluster is mostly unaffected by a temporary disconnection from Google Cloud. Only the Connect Gateway is impacted. If you are using Container Registry, Artifact Registry, Cloud Build, or Google Cloud Deploy to manage your container images or CI/CD pipelines in Google Cloud, they are not available anymore in case of disconnection. Strategies to deal with disconnection for those products are outside of the scope of this document.

Action Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Application deployment Done locally using kubectl, through CI/CD tooling, or using the Connect Gateway. The Connect Gateway is not available. All other methods of deployments still work as long as they connect directly to the Kubernetes API. Unlimited If you were using the Connect Gateway, switch to using kubectl locally.
Application removal Done locally using kubectl, through CI/CD tooling, or using the Connect Gateway. The Connect Gateway is not available. All other methods of deployments still work as long as they connect directly to the Kubernetes API. Unlimited If you were using the Connect Gateway, switch to using kubectl locally.
Application scale-out Done locally using kubectl, through CI/CD tooling, or using the Connect Gateway. The Connect Gateway is not available. All other methods of deployments still work as long as they connect directly to the Kubernetes API. Unlimited If you were using the Connect Gateway, switch to using kubectl locally.

Logging and monitoring

Auditability helps your organization meet its regulatory requirements and compliance policies. Anthos helps with auditability by offering application logging, Kubernetes logging, and audit logging. Many customers choose to leverage Google's Cloud Logging and Cloud Monitoring to avoid managing a logging and monitoring infrastructure on-prem. Other customers prefer to centralize their logs into an on-prem system for aggregation. To support these customers, Anthos provides direct integration to services such as Prometheus, Elastic, Splunk, or Datadog. In this mode, during temporary disconnection from Google Cloud, there is no impact on logging or monitoring functionality.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Application logging using Cloud Logging Logs are written to Cloud Logging. Logs are buffered to the local disk. 4h or 4GB, whichever comes first. When the buffer fills, then the oldest entries are dropped. Use a local logging solution.
System/Kubernetes logging using Cloud Logging Logs are written to Cloud Logging. Logs are buffered to the local disk. 4h or 4GB, whichever comes first. When the buffer fills, then the oldest entries are dropped. Use a local logging solution.
Access logging using Cloud Logging Logs are written to Cloud Logging. Logs are buffered to the local disk. 4h. When the buffer fills, then the oldest entries are dropped. Use a local logging solution.
Audit logging using Cloud Audit Logs Logs are written to Cloud Logging. Logs are buffered to the local disk. 10GiB of local buffer. When the buffer fills, then the oldest entries are dropped. Set up log forwarding to a local logging solution.
Application logging using other provider You can use different third-party providers like Elastic, Splunk, Datadog, or Loki. No impact Unlimited -
System/Kubernetes logging using other provider You can use different third-party providers like Elastic, Splunk, or Datadog. No impact Unlimited -
Application and Kubernetes metrics written to Cloud Monitoring The metrics are written to Cloud Monitoring. Metrics are buffered to the local disk. 6GiB for system metrics, and 1GiB for application metrics. When the buffer fills, then the oldest entries are dropped. Use a local monitoring solution.
Accessing and reading monitoring data from Kubernetes and application workloads All metrics are available in the console and through the Cloud Monitoring API. Metrics are not updated in Cloud Monitoring during the disconnection. 6GiB for system metrics, and 1GiB for application metrics. When the buffer fills, then the oldest entries are dropped. Use a local monitoring solution.
Alerting rules and paging for metrics Cloud Monitoring supports alerting. You can create alerts for any metric. Alerts can be sent through different channels. Alerts are not triggered while disconnected. 6GiB for system metrics, and 1GiB for application metrics. When the buffer fills, then the oldest entries are dropped. Use a local monitoring and alerting solution.

Config and policy management

Anthos Config Management lets you manage configuration and policies at scale, across all of your clusters. You store configurations and policies in a Git repository, and they are synchronized automatically to your clusters.

Config Sync

Config Sync is a component of Anthos Config Management that uses in-cluster agents to connect directly to a Git repository. You can manage changes to the repository URL or the synchronization parameters with the gcloud or kubectl tools.

During temporary disconnection, the synchronization is unaffected if the in-cluster agents can still reach the Git repository. However, if you change the synchronization parameters with the Google Cloud CLI or the console, they are not applied to the cluster during the disconnection. You can temporarily overwrite them locally using kubectl. Any local changes are overwritten on reconnection.

Policy Controller

Policy Controller enables the enforcement of fully programmable policies for your clusters. These policies act as "guardrails" and prevent any changes that violate security, operational, or compliance controls that you have defined.

Action Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Syncing configuration from a Git repository In-cluster agents connect directly to the Git repository. You can change the repository URL or syncronisation parameters with the Google Cloud API. Syncing of configurations is unaffected. If you change the synchronization parameters with gcloud or in the console, they are not be applied to the cluster during the disconnection. You can temporarily overwrite them locally using kubectl. Any local changes is overwritten on reconnection. Unlimited Never use the Fleet API for Config Sync, and only configure it manually.
Enforcing policies on requests to the Kubernetes API The in-cluster agent enforces constraints thanks to its integration with the Kubernetes API. You manage policies using the local Kubernetes API. You manage the system configuration of Policy Controller with a Google Cloud API. Policy enforcement is unaffected. Policies are still managed using the local Kubernetes API. Changes to the Policy Controller system configuration using the Google Cloud API are not propagated to the cluster, but you can temporarily overwrite them locally. Any local changes is overwritten on reconnection. Unlimited Never use the Fleet API for Policy Controller, and only configure it manually.
Installing, configuring, or upgrading Anthos Config Management using the Google Cloud API You use the Google Cloud API to manage the installation and upgrade of in-cluster agents. You also use this API (or gcloud, or the console) to manage the configuration of these agents. In-cluster agents continue to operate normally. You cannot install, upgrade, or configure in-cluster agents using the Google Cloud API. Any pending installations, upgrades, or configurations done using the API proceed upon reconnection. Zero Never use the Fleet API for Policy Controller, and only configure it manually.
Viewing system or sync status in the console You can view the health of the in-cluster agents and the synchronization status using the Google Cloud API or the console. Status information in the Google Cloud API or console becomes stale. The API shows a connection error. All the information remains available on a per-cluster basis using the local Kubernetes API. Zero Use the nomos CLI or the local Kubernetes API.

Security

Identity, authentication, and authorization

Anthos can connect directly to Cloud Identity for application and user roles, to manage workloads using Anthos Connect, or for endpoint authentication using OIDC. In case of disconnection from Google Cloud, the connection to Cloud Identity is also severed, and those features are not available anymore. For workloads that require additional resiliency through a temporary disconnection, you can use Anthos Identity Service to integrate with an LDAP or OIDC provider (including ADFS) to configure end-user authentication.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Cloud Identity as identity provider, using the Connect gateway You can access Anthos resources using Cloud Identity as the identity provider, and connecting through the Connect gateway. The Connect gateway requires a connection to Google Cloud. You are not able to connect to your clusters during the disconnection. Zero Use Anthos Identity Service to federate with another identity provider.
Identity and authentication using a third-party identity provider Supports OIDC and LDAP. You use the gcloud CLI to first login. For OIDC providers, you can use the console to login. You can then authenticate normally against the cluster API (for example, using kubectl). As long as the identity provider remains accessible to both you and the cluster, then you can still authenticate against the cluster API. You can't login through the console. You can only update the OIDC or LDAP configuration of your clusters locally, you cannot use the console. Unlimited -
Authorization Anthos supports role-based access control (RBAC). Roles can be attributed to users, groups, or service accounts. User identities and groups can be retrieved from the identity provider. The RBAC system is local to the Kubernetes cluster and is not affected by disconnection from Google Cloud. However, if it relies on identities coming from Cloud Identity then, they are not available in case of disconnection. Unlimited -

Secret and key management

Secret and key management is an important part of your security posture. The behavior of Anthos in case of disconnection from Google Cloud depends on which service you are using for those features.

Read the Managing secrets Anthos security blueprint for more information.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Secret and key management using Cloud Key Management Service and Secret Manager You directly use Cloud Key Management Service for your cryptographic keys, and Secret Manager for your secrets. Both Cloud Key Management Service and Secret Manager are not available. Zero Use local systems instead.
Secret and key management using Hashicorp Vault and Google Cloud services You configure Hashicorp Vault to use Cloud Storage or Cloud Spanner to store secrets, and Cloud Key Management Service to manage keys. If Hashicorp Vault runs on your Anthos cluster and is also impacted by the disconnection, then secret storage and key management are not available during the disconnection. Zero Use local systems instead.
Secret and key management using Hashicorp Vault and on-premises services You configure Hashicorp Vault to use an on-premises storage backend for secrets, and an on-premises key management system (such as a hardware security module). Disconnection from Google Cloud has no impact. Unlimited -

Networking and network services

Load Balancing

To expose Kubernetes Services hosted in an Anthos cluster to users, you have the choice to use a bundled load balancer (Anthos on bare metal with MetalLB, Anthos on-prem with Seesaw or MetalLB) or your load balancer, external to Anthos. Both options keep working in case of a disconnection from Google Cloud.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
L4 bundled load-balancer Provides L4 load balancing entirely locally with no dependeency on Google Cloud APIs or network. No change Unlimited -
Manual or integrated load balancer Supports F5 BIG-IP and others that are also hosted on-premises. No change Unlimited -

Anthos Service Mesh

You can use Anthos Service Mesh to manage, observe, and secure communications across your services running in an Anthos cluster. Not all Anthos Service Mesh features are supported on Anthos clusters on bare metal and Anthos clusters on VMware: see the list of supported features for more information.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Deploying or updating policies (routing, authorization, security, audit, etc.) You can use the console, kubectl, asmcli, or istioctl to manage Anthos Service Mesh policies. You can only use kubectl or istioctl to manage Anthos Service Mesh policies. Unlimited Use kubectl or istioctl
Certificate authority (CA) You can use either the in-cluster CA or the Mesh CA to manage the certificates used by Anthos Service Mesh. There is no impact if you are using the in-cluster CA.
If you are using the Mesh CA, then certificates expire after 24 hours. New service instances cannot retrieve certificates.
Unlimited for in-cluster CA.
Degraded service during 24h, and no service after 24h for Mesh CA.
Use the in-cluster CA.
Cloud Monitoring for Anthos Service Mesh You can use Cloud Monitoring to store, explore and exploit HTTP-related metrics coming from Anthos Service Mesh. Metrics are not stored. Zero Use a compatible local monitoring solution such as Prometheus.
Anthos Service Mesh audit logging Anthos Service Mesh relies on the local Kubernetes logging facilities. The behavior depends on how you configured logging for your Anthos cluster. Depends on how you configured logging for your Anthos cluster. - -
Ingress gateway You can define external IPs with the Istio Ingress Gateway. No impact Unlimited -
Istio Container Network Interface (CNI) You can configure Anthos Service Mesh to use the Istio CNI instead of iptables to manage the traffic. No impact Unlimited -
Anthos Service Mesh end-user authentication for web applications You can use the Anthos Service Mesh ingress gateway to integrate with your own identity provider (through OIDC) to authenticate and authorize end-users on web applications that are part of the mesh. No impact Unlimited -

Other network services

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
DNS The Kubernetes DNS server runs inside the cluster. The Kubernetes DNS service works normally as it runs inside the cluster itself. Unlimited -
Egress proxy You can configure Anthos to use a proxy for egress connections. If your proxy runs on-premises, Anthos is still able to use it during a temporary disconnection. However, if the proxy loses the connection to Google Cloud, then all the scenarios from this document still apply. Unlimited -

Google Cloud Marketplace

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Deploying and managing applications and services from the Cloud Marketplace The Cloud Marketplace is available in the console, and you can use it to discover, acquire, and deploy solutions. You cannot use the Cloud Marketplace. Some solutions from the Cloud Marketplace might have their own connectivity requirements which are not documented here. Zero None

Support

This section covers the scenarios that you might have to go through while interacting with Google Cloud support or your operating partner for a case related to your Anthos clusters.

Feature Connected behavior Temporary disconnection behavior Maximum disconnection tolerance Loss of connectivity workaround
Sharing a cluster snapshot with the support team You can create a cluster snapshot locally using the bmctl check cluster or gkectl diagnose snapshot commands. You share this snapshot through the normal support process. You can still generate the snapshot as it is a local operation. If you lost access to Google Cloud and its support web interfaces, you can phone the support team provided you have subscribed to the Enhanced or Premium support plans. Unlimited -
Sharing relevant log data with the support team You can collect logs locally from your cluster and share them through the normal support process. You can still collect logs from your cluster. If you lost access to Google Cloud and its support web interfaces, you can phone the support team provided you have subscribed to the Enhanced or Premium support plans. Unlimited -