Impact of temporary disconnection from Google Cloud

Google Distributed Cloud software only is based on Kubernetes, and you can deploy it on-premises on either VMware or bare metal servers. Although Distributed Cloud runs on-premises, we design it to have a permanent connection to Google Cloud for a number of reasons, including monitoring and management. However, you might need to know what happens if, for any reason, you lose the connection to Google Cloud (for example, because of a technical problem). This document outlines the impact of a loss of connectivity for clusters in a Distributed Cloud software-only deployment (on bare metal or on VMware), and which workarounds you can use in this event.

This information is useful for architects who need to prepare for an unplanned or forced disconnection from Google Cloud and understand its consequences. However, you shouldn't plan to use a software-only Distributed Cloud deployment that's disconnected from Google Cloud as a nominal working mode. Remember that we design Distributed Cloud to take advantage of the scalability and availability of Google Cloud services. This document draws on the design and architecture of the various Google Cloud components that work compatibly with Distributed Cloud during a temporary interruption. We can't guarantee that this document is exhaustive.

This document assumes that you are familiar with GKE. If that isn't the case, we recommend that you first read the GKE overview.

License validation and metering

If you have enabled the Anthos API (anthos.googleapis.com) in your Google Cloud project, the metering controller running in the cluster generates and refreshes the license entitlement periodically. The tolerance for disconnection is 12 hours. Additionally, the system requires the connection for managing metering and billing.

This table lists the behavior of features related to licensing and metering in case of temporary disconnection from Google Cloud:

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
License validation	The metering controller generates and refreshes the license entitlement custom resource periodically, as long as `anthos.googleapis.com` is enabled in the Google Cloud project.	The components that consume the entitlement custom resource support a grace period: they continue to function as long as the entitlement custom resource is refreshed within the grace period.	Unlimited. After the grace period expires, components start to log errors. You can't upgrade your cluster anymore.	None
Metering and billing	The metering controller reports the vCPU capacity of the cluster to the Google Cloud Service Control API for billing purposes.	An in-cluster agent persists billing records in the cluster during disconnection and retrieves the records once the cluster re-connects to Google Cloud.	Unlimited. However, metering information is required for compliance as stated in the Service Specific Terms for "Premium Software".	None

Cluster lifecycle

This section covers scenarios such as creating, updating, deleting, and resizing clusters, as well as monitoring the status of these activities.

For most scenarios, you can use CLI tools such as bmctl, gkectl, and kubectl to perform operations during a temporary disconnection. You can also monitor the status of these operations with these tools. Upon reconnection, the Google Cloud console updates to display the results of operations performed during the disconnected period.

Action	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Cluster creation	You use the `bmctl` or `gkectl` CLI tools to create clusters. This operation requires a connection to Google Cloud.	You can't create clusters.	Zero	None
Cluster upgrade	You use the `bmctl` or `gkectl` CLI tools to upgrade clusters. This operation requires a connection to Google Cloud.	You can't upgrade clusters.	Zero	None
Cluster deletion	You use the `bmctl` or `gkectl` CLI tools to delete clusters. This operation doesn't require a connection to Google Cloud.	You can delete clusters.	Unlimited	-
Viewing cluster status	You can see information about your clusters in the console, in the list of Google Kubernetes Engine clusters.	Cluster information isn't shown in the console.	Unlimited	Use `kubectl` to directly query your clusters and get the information you need.
Removing nodes from a cluster	You don't need a connection to Google Cloud to remove nodes from a cluster.	You can remove nodes from a cluster.	Unlimited	-
Adding nodes to a cluster	The new node pulls container images from Container Registry to properly work. A preflight check runs to validate that there is connectivity to Google Cloud.	The preflight checks that run when adding a new node validate that there is connectivity to Google Cloud. Therefore, you can't add a new node to a cluster when disconnected.	Zero	None

Application lifecycle

A temporary disconnection from Google Cloud mostly doesn't affect managing your applications running in an on-premises cluster. Only the connect gateway is affected. If you use Container Registry, Artifact Registry, Cloud Build, or Cloud Deploy to manage your container images or CI/CD pipelines in Google Cloud, they become unavailable in case of disconnection. Strategies to deal with disconnection for those products are outside of the scope of this document.

Action	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Application deployment	You deploy applications locally using `kubectl`, through CI/CD tooling, or using the connect gateway.	The connect gateway isn't available. All other methods of deployments still work as long as they connect directly to the Kubernetes API.	Unlimited	If you use the connect gateway, switch to using `kubectl` locally.
Application removal	You remove applications locally using `kubectl`, through CI/CD tooling, or using the connect gateway.	The connect gateway isn't available. All other methods of deployments still work as long as they connect directly to the Kubernetes API.	Unlimited	If you use the connect gateway, switch to using `kubectl` locally.
Application scale-out	You scale out applications locally using `kubectl`, through CI/CD tooling, or using the connect gateway.	The connect gateway isn't available. All other methods of deployments still work as long as they connect directly to the Kubernetes API.	Unlimited	If you use the connect gateway, switch to using `kubectl` locally.

Logging and monitoring

Auditability helps your organization meet its regulatory requirements and compliance policies. Distributed Cloud helps with auditability by offering application logging, Kubernetes logging, and audit logging. Many customers choose to use Google's Cloud Logging and Cloud Monitoring to avoid managing a logging and monitoring infrastructure on-premises. Other customers prefer to centralize their logs into an on-premises system for aggregation. To support these customers, Distributed Cloud supports direct integration to services such as Prometheus. In this mode, during temporary disconnection from Google Cloud, there is no impact on logging or monitoring functionality.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Application logging using Cloud Logging	The system writes logs to Cloud Logging.	The system buffers logs to the local disk.	4.5h or 4GiB local buffer per node. When the buffer fills or the disconnection lasts 4.5 hours, then the system drops the oldest entries.	Use a local logging solution.
System/Kubernetes logging using Cloud Logging	The system writes logs to Cloud Logging.	The system buffers logs to the local disk.	4.5h or 4GiB local buffer per node. When the buffer fills or the disconnection lasts 4.5 hours, then the system drops the oldest entries.	Use a local logging solution.
Audit logging using Cloud Audit Logs	The system writes logs to Cloud Logging.	The system buffers logs to the local disk.	10GiB local buffer per control plane node. When the buffer fills, then the system drops the oldest entries.	Set up log forwarding to a local logging solution.
Application logging using other provider	You can use different third-party providers like Elastic, Splunk, Datadog, or Loki.	No impact	Unlimited	-
System/Kubernetes logging using other provider	You can use different third-party providers like Elastic, Splunk, or Datadog.	No impact	Unlimited	-
Application and Kubernetes metrics written to Cloud Monitoring	The system writes metrics to Cloud Monitoring.	The system buffers metrics to the local disk.	24h or 6GiB local buffer per node for system metrics and 1GiB local buffer per node for application metrics. When the buffer fills or the disconnection lasts 24 hours, then the system drops the oldest entries	Use a local monitoring solution.
Accessing and reading monitoring data from Kubernetes and application workloads	All metrics are available in the console and through the Cloud Monitoring API.	The system doesn't update metrics in Cloud Monitoring during the disconnection.	24h or 6GiB local buffer per node for system metrics and 1GiB local buffer per node for application metrics. When the buffer fills or the disconnection lasts 24 hours, then the system drops the oldest entries	Use a local monitoring solution.
Alerting rules and paging for metrics	Cloud Monitoring supports alerting. You can create alerts for any metric. The system can send alerts through different channels.	The system doesn't trigger alerts while disconnected. The system only triggers alerts from metrics data already sent into Cloud Monitoring.		Use a local monitoring and alerting solution.

Config and policy management

Config Sync and Policy Controller lets you manage configuration and policies at scale, across all of your clusters. You store configurations and policies in a Git repository, and the system automatically synchronizes them to your clusters.

Config Sync

Config Sync uses in-cluster agents to connect directly to a Git repository. You can manage changes to the repository URL or the synchronization parameters with the Google Cloud CLI or kubectl tools.

During temporary disconnection, synchronization remains unaffected if the in-cluster agents can still reach the Git repository. However, if you change the synchronization parameters with the gcloud CLI or the console, the cluster doesn't apply them during the disconnection. You can temporarily overwrite them locally using kubectl. Reconnection overwrites any local changes.

Policy Controller

Policy Controller enables the enforcement of fully programmable policies for your clusters. These policies act as "guardrails" and prevent any changes that violate security, operational, or compliance controls that you have defined.

Action	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Syncing configuration from a Git repository	In-cluster agents connect directly to the Git repository. You can change the repository URL or synchronization parameters with a Google Cloud API.	Configuration syncing remains unaffected. If you change the synchronization parameters with the gcloud CLI or in the console, the cluster doesn't apply them during the disconnection. You can temporarily overwrite them locally using `kubectl`. Reconnection overwrites any local changes.	Unlimited	Never use the Fleet API for Config Sync, and only configure it by using the Kubernetes API.
Enforcing policies on requests to the Kubernetes API	The in-cluster agent enforces constraints thanks to its integration with the Kubernetes API. You manage policies using the local Kubernetes API. You manage the system configuration of Policy Controller with a Google Cloud API.	Policy enforcement remains unaffected. You still manage policies using the local Kubernetes API. The system doesn't propagate changes to the Policy Controller system configuration using the Google Cloud API to the cluster, but you can temporarily overwrite them locally. Reconnection overwrites any local changes.	Unlimited	Never use the Fleet API for Policy Controller, and only configure it by using the Kubernetes API.
Installing, configuring, or upgrading Config Sync using the Google Cloud API	You use the Google Cloud API to manage the installation and upgrade of in-cluster agents. You also use this API (or the gcloud CLI, or the console) to manage the configuration of these agents.	In-cluster agents continue to operate normally. You can't install, upgrade, or configure in-cluster agents using the Google Cloud API. Any pending installations, upgrades, or configurations done using the API proceed upon reconnection.	Zero	Never use the Fleet API for Policy Controller, and only configure it by using the Kubernetes API.
Viewing system or sync status in the console	You can view the health of the in-cluster agents and the synchronization status using a Google Cloud API or the console.	Status information in the Google Cloud API or console becomes stale. The API shows a connection error. All the information remains available on a per-cluster basis using the local Kubernetes API.	Zero	Use the nomos CLI or the local Kubernetes API.

Security

This section outlines how security features, including identity, authentication, authorization, and secret management, are affected by a temporary disconnection from Google Cloud.

Identity, authentication, and authorization

Distributed Cloud can connect directly to Cloud Identity for application and user roles, to manage workloads using Connect, or for endpoint authentication using OIDC. A disconnection from Google Cloud severs the connection to Cloud Identity, making those features unavailable. For workloads that require additional resiliency through a temporary disconnection, you can use GKE Identity Service to integrate with an LDAP or OIDC provider (including ADFS) to configure end-user authentication.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Cloud Identity as identity provider, using the connect gateway	You can access Distributed Cloud resources using Cloud Identity as the identity provider, and connecting through the connect gateway.	The connect gateway requires a connection to Google Cloud. You aren't able to connect to your clusters during the disconnection.	Zero	Use GKE Identity Service to federate with another identity provider.
Identity and authentication using a third-party identity provider	Supports OIDC and LDAP. You use the gcloud CLI to first log in. For OIDC providers, you can use the console to log in. You can then authenticate normally against the cluster API (for example, using `kubectl`).	As long as the identity provider remains accessible to both you and the cluster, then you can still authenticate against the cluster API. You can't log in through the console. You can only update the OIDC or LDAP configuration of your clusters locally, you can't use the console.	Unlimited	-
Authorization	Distributed Cloud supports role-based access control (RBAC). Roles can be attributed to users, groups, or service accounts. The system retrieves user identities and groups from the identity provider.	The RBAC system is local to the Kubernetes cluster, and disconnection from Google Cloud doesn't affect it. However, if it relies on identities coming from Cloud Identity then, they aren't available in case of disconnection.	Unlimited	-

Secret and key management

Secret and key management is an important part of your security posture. The behavior of Distributed Cloud in case of disconnection from Google Cloud depends on which service you are using for those features.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Secret and key management using Cloud Key Management Service and Secret Manager	You directly use Cloud Key Management Service for your cryptographic keys, and Secret Manager for your secrets.	Both Cloud Key Management Service and Secret Manager aren't available.	Zero	Use local systems instead.
Secret and key management using Hashicorp Vault and Google Cloud services	You configure Hashicorp Vault to use Cloud Storage or Spanner to store secrets, and Cloud Key Management Service to manage keys.	If Hashicorp Vault runs on your on-premises cluster and the disconnection also impacts it, then secret storage and key management aren't available during the disconnection.	Zero	Use local systems instead.
Secret and key management using Hashicorp Vault and on-premises services	You configure Hashicorp Vault to use an on-premises storage backend for secrets, and an on-premises key management system (such as a hardware security module).	Disconnection from Google Cloud has no impact.	Unlimited	-

Networking and network services

This section covers the networking and network services for on-premises clusters, including how they are impacted by a temporary disconnection from Google Cloud. It provides information on load balancing, Cloud Service Mesh, and other network services.

Load Balancing

To expose Kubernetes Services hosted in an on-premises cluster to users, you have the following options:

Bare metal:
- Use a provided bundled load balancer, MetalLB or Bundled with BGP.
- Manually configure your clusters to use your own load balancer, external to Distributed Cloud.
VMware:
- Use the provided bundled load balancer, MetalLB.
- Manually configure your clusters to use your own load balancer, external to Distributed Cloud.

These load balancing options remain operational even if disconnected from Google Cloud.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
L4 bundled load-balancer	Provides L4 load balancing entirely locally with no dependency on Google Cloud APIs or network.	No change	Unlimited	-
Manual or integrated load balancer	Supports F5 BIG-IP and others that are also hosted on-premises.	No change	Unlimited	-

Cloud Service Mesh

You can use Cloud Service Mesh to manage, observe, and secure communications across your services running in an on-premises cluster. Distributed Cloud doesn't support all Cloud Service Mesh features: see the list of supported features for more information.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Deploying or updating policies (routing, authorization, security, audit, etc.)	You can use the console, `kubectl`, `asmcli`, or `istioctl` to manage Cloud Service Mesh policies.	You can only use `kubectl` or `istioctl` to manage Cloud Service Mesh policies.	Unlimited	Use `kubectl` or `istioctl`
Certificate authority (CA)	You can use either the in-cluster CA or the Cloud Service Mesh certificate authority to manage the certificates used by Cloud Service Mesh.	There is no impact if you are using the in-cluster CA. If you are using the Cloud Service Mesh certificate authority, then certificates expire after 24 hours. New service instances can't retrieve certificates.	Unlimited for in-cluster CA. Degraded service during 24h, and no service after 24h for Cloud Service Mesh certificate authority.	Use the in-cluster CA.
Cloud Monitoring for Cloud Service Mesh	You can use Cloud Monitoring to store, explore and exploit HTTP-related metrics coming from Cloud Service Mesh.	Metrics aren't stored.	Zero	Use a compatible local monitoring solution such as Prometheus.
Cloud Service Mesh audit logging	Cloud Service Mesh relies on the local Kubernetes logging facilities. The behavior depends on how you configured logging for your on-premises cluster.	Depends on how you configured logging for your on-premises cluster.	-	-
Ingress gateway	You can define external IPs with the Istio Ingress Gateway.	No impact	Unlimited	-
Istio Container Network Interface (CNI)	You can configure Cloud Service Mesh to use the Istio CNI instead of iptables to manage the traffic.	No impact	Unlimited	-
Cloud Service Mesh end-user authentication for web applications	You can use the Cloud Service Mesh ingress gateway to integrate with your own identity provider (through OIDC) to authenticate and authorize end-users on web applications that are part of the mesh.	No impact	Unlimited	-

Other network services

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
DNS	The Kubernetes DNS server runs inside the cluster.	The Kubernetes DNS service works normally as it runs inside the cluster itself.	Unlimited	-
Egress proxy	You can configure your on-premises clusters to use a proxy for egress connections.	If your proxy runs on-premises, the cluster is still able to use it during a temporary disconnection. However, if the proxy loses the connection to Google Cloud, then all the scenarios from this document still apply.	Unlimited	-

Google Cloud Marketplace

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Deploying and managing applications and services from the Cloud Marketplace	The Cloud Marketplace is available in the console, and you can use it to discover, acquire, and deploy solutions.	You can't use the Cloud Marketplace. Some solutions from the Cloud Marketplace might have their own connectivity requirements which aren't documented here.	Zero	None

Support

This section covers the scenarios that you might have to go through while interacting with Google Cloud support or your operating partner for a case related to your GKE on GDC clusters.

Feature	Connected behavior	Temporary disconnection behavior	Maximum disconnection tolerance	Loss of connectivity workaround
Sharing a cluster snapshot with the support team	You can create a cluster snapshot locally using the `bmctl check cluster` or `gkectl diagnose snapshot` commands. You share this snapshot through the normal support process.	You can still generate the snapshot as it is a local operation. If you lost access to Google Cloud and its support web interfaces, you can phone the support team provided you have subscribed to the Enhanced or Premium support plans.	Unlimited	-
Sharing relevant log data with the support team	You can collect logs locally from your cluster and share them through the normal support process.	You can still collect logs from your cluster. If you lost access to Google Cloud and its support web interfaces, you can phone the support team provided you have subscribed to the Enhanced or Premium support plans.	Unlimited	-