This page describes how to shut down and reboot any one blade of Google Distributed Cloud (GDC) air-gapped appliance, for example to move the device to maintain a blade.
Before you begin
Ensure you stop all workloads before proceeding further. Google cannot guarantee what will happen if workloads are active during a shutdown.
If you want to shut down all the blades, follow Shut down the device. When following these instructions, shut down only one blade and keep Google Distributed Cloud (GDC) air-gapped appliance running with two active blades.
Prerequisites
- You can execute this runbook on a laptop or workstation connected to Google Distributed Cloud (GDC) air-gapped appliance's network. Alternatively, you can connect a laptop or workstation to switch by following Connect the device.
- Make sure you have the access to Kubeconfig for the root-admin cluster.
- Set correct KUBECONFIG environment variable by running
export KUBECONFIG=<path to kubeconfig>
.
Shut down the blade
Get the node information by running
kubectl get nodes -A
. DetermineNODE_NAME
of blade to shut down.Pause BareMetalHost sync by running following command for the blade to be shut down:
kubectl annotate bmhost -n gpc-system NODE_NAME "baremetalhost.metal3.io/paused=true" --overwrite
Sample output of this command is:
baremetalhost.metal3.io/**-**-bm** annotated
Cordon the target node:
kubectl cordon NODE_NAME
Sample output is:
node/**-**-bm** cordoned
Drain the target node:
kubectl drain NODE_NAME --delete-emptydir-data --grace-period 900 --ignore-daemonsets --disable-eviction
Sample output:
node/**-**-bm** already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/anetd-krj2z, kube-system/etcd-defrag-xh469, kube-system/ipam-controller-manager-2f4dz, kube-system/istio-cni-node-cgqv4, kube-system/kube-proxy-5mwf2, kube-system/localpv-mn2jh, kube-system/metallb-speaker-6l7sv, mon-system/mon-node-exporter-backend-nd8mp, netapp-trident/netapp-trident-node-linux-rrlmd, obs-system/anthos-audit-logs-forwarder-tpfqv, obs-system/anthos-log-forwarder-npjh4, obs-system/kube-control-plane-metrics-proxy-wp8nh, obs-system/log-failure-detector-crbnv, obs-system/oplogs-forwarder-sqwvj, vm-system/macvtap-v9pgp, vm-system/virt-handler-86khx pod/grafana-0 deleted pod/capi-kubeadm-bootstrap-controller-manager-1.30.400-gke.136lvgtf deleted pod/grafana-0 deleted pod/grafana-proxy-server-86d8fc4758-mkc4f deleted . . .
Gracefully shutdown the target node:
Turn off
NODE_NAME
using iLO:Retrieve the credentials to access the iLO:
Get the username:
kubectl get secret bmc-credentials-NODE_NAME -n gpc-system -o jsonpath="{.data.username}" | base64 --decode
Get the password:
kubectl get secret bmc-credentials-NODE_NAME -n gpc-system -o jsonpath="{.data.password}" | base64 --decode
Retrieve the
BMC-IP
address forNODE_NAME
from values inBMC-IP
column:kubectl get servers -A
Visit the
BMC-IP
address obtained in the previous step and sign in by entering the username and password obtained.Hover over first button on top row. It should display
Power: ON
. Click it. A drop-down menu will appear, click first item labelledMomentary Press
. The button colour will change from Green to Orange, meaning node is shutting down. Wait for the button to change color to yellow, indicating the machine has powered off. This will take a few minutes.
Wait for 30 minutes for the reconciliation to complete.
Reboot the blade
This section describes steps to boot up a blade which was shut down earlier.
Prerequisites
You can execute this runbook on a laptop or workstation connected to Google Distributed Cloud (GDC) air-gapped appliance's network. Alternatively, you can connect a laptop or workstation to switch by following Connect the device. Make sure you have the access to Kubeconfig for the root-admin cluster. And set correct KUBECONFIG environment variable by running export KUBECONFIG=<path to kubeconfig>
.
Plan of action
Press the power button on the blade. Once the blade is powered up, wait for a few minutes for the control plane to connect.
kubectl
should be able to connect to control plane in under 30 minutes.Determine name of the target node by running
kubectl get nodes -A
.Uncordon the target node to enable scheduling:
kubectl uncordon `NODE_NAME`
Resume sync of BareMetalHost for the target node:
kubectl annotate bmhost -n gpc-system NODE_NAME "baremetalhost.metal3.io/paused=false" --overwrite
Wait for 30 minutes for the reconciliation to complete.