This page describes how to shut down and powerup Google Distributed Cloud (GDC) air-gapped appliance. For example: to move the device to a new location.
You might use GDC air-gapped appliance in transient operational locations, where it is necessary to shut the device for transport in order to move the device between locations. You might also need to restore the device from a power failure, as generators might power it in rugged environments.
Before you begin
Ensure you stop all workloads before proceeding further. Google cannot guarantee what will happen if workloads are active during a shutdown.
Prerequisites
- You can execute this runbook on a laptop or workstation connected to Google Distributed Cloud (GDC) air-gapped appliance's network. Alternatively, you can connect a laptop or workstation to switch by following Connect the device.
- Make sure that you have the access to the kubeconfig for the root-admin cluster.
- Set the correct KUBECONFIG environment variable by running export KUBECONFIG=PATH_TO_KUBECONFIG.
- Ensure that you have the SSH key and certificate.
Shut down the blades
- Get info of nodes by running - kubectl get nodes -A -o wide.
- Pause BareMetalHost sync by running following command for all nodes one by one.Replace - NODE_NAMEwith the node names obtained in Step 1:- kubectl annotate bmhost -n gpc-system NODE_NAME "baremetalhost.metal3.io/paused=true" --overwrite- The output might look like this example: - baremetalhost.metal3.io/**-**-bm01 annotated baremetalhost.metal3.io/**-**-bm02 annotated baremetalhost.metal3.io/**-**-bm03 annotated
- Cordon all nodes one by one: - kubectl cordon NODE_NAME- The output might look like this example: - node/**-**-bm01 cordoned node/**-**-bm02 cordoned node/**-**-bm03 cordoned
- To determine etcd leader node and follower nodes, run this step one by one for all nodes: - Find target IPs for SSH by noting values under - INTERNAL-IPcolumn of output from- kubectl get nodes -A -o wide. Establish an SSH connection:- ssh root@INTERNAL-IP
- To determine whether current node is etcd leader or follower, run following command inside the SSH terminal: - ETCDCTL_API=3 etcdctl \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/server.crt \ --key /etc/kubernetes/pki/etcd/server.key \ --write-out=table endpoint status- Pay attention to the - IS LEADERfield.- The output might look like this example for the etcd leader node: - [root@**-**-bm0* ~]# ETCDCTL_API=3 etcdctl \ > --cacert /etc/kubernetes/pki/etcd/ca.crt \ > --cert /etc/kubernetes/pki/etcd/server.crt \ > --key /etc/kubernetes/pki/etcd/server.key \ > --write-out=table endpoint status +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ************** | **************** | 3.4.30-gke.1 | 162 MB | true | false | 3641 | 12957958 | 12957958 | | +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+- The output might look like this example for the two etcd follower nodes: - [root@**-**-bm0* ~]# ETCDCTL_API=3 etcdctl \ > --cacert /etc/kubernetes/pki/etcd/ca.crt \ > --cert /etc/kubernetes/pki/etcd/server.crt \ > --key /etc/kubernetes/pki/etcd/server.key \ > --write-out=table endpoint status +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ************** | **************** | 3.4.30-gke.1 | 163 MB | false | false | 3641 | 12957404 | 12957404 | | +----------------+------------------+--------------+---------+-----------+------------+-----------+------------+--------------------+--------+- Note down the etcd-leader and etcd-follower status of the nodes. 
 
- Drain the two etcd follower nodes. Do not drain the etcd leader node. - kubectl drain NODE_NAME --delete-emptydir-data --grace-period 900 --ignore-daemonsets --disable-eviction- The output might look like this: - node/**-**-bm01 already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/anetd-krj2z, kube-system/etcd-defrag-xh469, kube-system/ipam-controller-manager-2f4dz, kube-system/istio-cni-node-cgqv4, kube-system/kube-proxy-5mwf2, kube-system/localpv-mn2jh, kube-system/metallb-speaker-6l7sv, mon-system/mon-node-exporter-backend-nd8mp, netapp-trident/netapp-trident-node-linux-rrlmd, obs-system/anthos-audit-logs-forwarder-tpfqv, obs-system/anthos-log-forwarder-npjh4, obs-system/kube-control-plane-metrics-proxy-wp8nh, obs-system/log-failure-detector-crbnv, obs-system/oplogs-forwarder-sqwvj, vm-system/macvtap-v9pgp, vm-system/virt-handler-86khx pod/grafana-0 deleted pod/capi-kubeadm-bootstrap-controller-manager-1.30.400-gke.136lvgtf deleted pod/grafana-0 deleted pod/grafana-proxy-server-86d8fc4758-mkc4f deleted . . . node/**-**-bm02 already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/anetd-v75jz, kube-system/etcd-defrag-t5jnc, kube-system/ipam-controller-manager-5958m, kube-system/istio-cni-node-ggv4c, kube-system/kube-proxy-r6x46, kube-system/localpv-g56xc, kube-system/metallb-speaker-tmw72, mon-system/mon-node-exporter-backend-9rs7k, netapp-trident/netapp-trident-node-linux-9jmfp, obs-system/anthos-audit-logs-forwarder-bwns9, obs-system/anthos-log-forwarder-lbskj, obs-system/kube-control-plane-metrics-proxy-grthl, obs-system/log-failure-detector-dzh4v, obs-system/oplogs-forwarder-vdn7z, vm-system/macvtap-mjwtc, vm-system/virt-handler-dlqvv pod/vai-web-plugin-backend-5dfd6d6597-nxxgn pod/vai-web-plugin-frontend-6b5468968b-mrr7g pod/grafana-proxy-server-64b759fbf6-b8pl8 pod/iam-bundledidp-backend-0 . . .
- Gracefully shutdown the two etcd follower nodes. Follow next step one by one for both node. 
- Turn off - NODE_NAMEusing iLO:- Retrieve the username for iLO: - kubectl get secret bmc-credentials-NODE_NAME -n gpc-system -o jsonpath="{.data.username}" | base64 --decode
- Retrieve the password for iLO: - kubectl get secret bmc-credentials-NODE_NAME -n gpc-system -o jsonpath="{.data.password}" | base64 --decode
- Retrieve the - BMC-IPaddress for- NODE_NAMEfrom values in- BMC-IPcolumn:- kubectl get servers -A
- Visit the - BMC-IPaddress obtained in the previous step and sign in by entering the username and password obtained.
- Hover over first button on top row. It should display - Power: ON. Click it. A drop-down menu will appear, click first item labelled- Momentary Press. The button colour will change from Green to Orange, meaning node is shutting down. Wait for the button to change color to yellow, indicating the machine has powered off. This will take a few minutes.
 
- After both etcd-follower nodes have shutdown, finally repeat Step 7 for the etcd leader node. 
Remove Yubikeys for transport
If you need to transport the system after installation completes, remove the Yubikeys and transport the Yubikeys separately. Ensure that you tag the keys yourself.
Power up and connect
If power was lost unexpectedly, such as a hard shutdown, the device automatically comes back up. In this case you should start from Step 7, skipping the Steps 1 to 6. After an unexpected power loss, you may experience data loss, even after restarting.
Plan of action
- Insert the yubikeys in each node. 
- Plug the GDC air-gapped appliance machine into power, and press the power button on each node in any order. 
- After the nodes are powered up, wait for a few minutes for the control plane to connect. - kubectlcan connect to the control plane in under 30 minutes.
- Get the names of nodes by running - kubectl get nodes -A.
- Uncordon each node to enable scheduling: - kubectl uncordon `NODE_NAME`
- Resume sync of the bare metal hosts for each node: - kubectl annotate bmhost -n gpc-system NODE_NAME "baremetalhost.metal3.io/paused=false" --overwrite
- Check the status of the nodes using - kubectl get nodes -A.- If all nodes are in - Readystate, wait for two hours for the reconciliation process to complete. The output might look like this:- NAME STATUS ROLES AGE VERSION **-**-bm01 Ready control-plane 4d13h v1.30.6-gke.300 **-**-bm02 Ready control-plane 4d13h v1.30.6-gke.300 **-**-bm03 Ready control-plane 4d13h v1.30.6-gke.300- In this case no further action is needed. 
- Otherwise, if one or more nodes are in 'NotReady' state, restart some services to get the cluster ready. The output might look like this: - NAME STATUS ROLES AGE VERSION **-**-bm01 Ready control-plane 4d13h v1.30.6-gke.300 **-**-bm02 Ready control-plane 4d13h v1.30.6-gke.300 **-**-bm03 NotReady control-plane 4d13h v1.30.6-gke.300- In this case, note down the name of node which is not ready, and proceed to next steps. 
 
- Establish an SSH connection into the - NotReadynode. The target IP addresses of SSH are values under- INTERNAL-IPcolumn of the output from- kubectl get nodes -A -o wide:- ssh root@INTERNAL-IP
- Restart - containerdand- kubeletservices on the- NotReadynode. Following commands are to be run on nodes, not on customer's laptop or workstation connected to Google Distributed Cloud (GDC) air-gapped appliance:- systemctl stop containerd systemctl daemon-reload systemctl restart containerd systemctl stop kubelet systemctl start kubelet
- To verify status of - containerdand- kubeletservices, run following commands on the- NotReadynode:- systemctl status kubelet systemctl status containerd- The output might look like this: - # systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─00-standalone_containerd.conf, 10-kubeadm.conf Active: active (running) since Thu 2025-03-27 07:58:27 UTC; 34s ago . . . # systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/etc/systemd/system/containerd.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2025-03-27 07:58:17 UTC; 52s ago . . .- If the - containerdand- kubeletservices are running fine after restart, then wait for two hours for the reconciliation to complete.