This page covers troubleshooting the virtual machine (VM) for the Application Operator (AO) in Google Distributed Cloud (GDC) air-gapped appliance.
Recover a full VM boot disk
If a VM runs out of space on the boot disk, for example, when an application
fills the boot disk partition with logs, critical capabilities on the VMs fail
to work. You might not have the ability to add a new SSH key through the
VirtualMachineAccessRequest
resource, or establish an SSH connection into the
VM using existing keys.
This page describes the steps to create a new VM and attaching the disk to recover the contents to a new VM as an additional disk. These steps demonstrate the following:
- A successful SSH connection into the new VM.
- Increase the amount of space by mounting the disk to recover and delete unnecessary data.
- Delete the new VM and replace the original disk to the original VM.
Before you begin
Before continuing, ensure you request project-level VM access. Follow
steps given
to assign the Project VirtualMachine Admin (project-vm-admin
) role.
To use gdcloud
command-line interface (CLI) commands, ensure that you have downloaded, installed,
and configured the gdcloud
CLI.
All commands for GDC air-gapped appliance use the gdcloud
or
kubectl
CLI, and require an operating system (OS) environment.
Get the kubeconfig file path
To run commands against the admin cluster, ensure you have the following resources:
Locate the admin cluster name, or ask your Platform Administrator (PA) what the cluster name is.
Sign in and generate the kubeconfig file for the admin cluster if you don't have one.
Use the path to replace
ADMIN_KUBECONFIG
in these instructions.
Recover a VM disk out of space
To recover a VM boot disk out of space, complete the following steps:
Stop the existing VM by following Stop a VM.
Edit the existing VM:
kubectl --kubeconfig ADMIN_KUBECONFIG edit \ virtualmachine.virtualmachine.gdc.goog -n PROJECT VM_NAME
Replace the existing VM disk name in the
spec
field with a new placeholder name:... spec: disks: - boot: true virtualMachineDiskRef: name: VM_DISK_PLACEHOLDER_NAME
Create a new VM with an image operating system (OS) different from the original VM. For example, if the original disk uses the OS
ubuntu-2004
, create the new VM withrocky-8
.Attach the original disk as an additional disk to the new VM:
... spec: disks: - boot: true autoDelete: true virtualMachineDiskRef: name: NEW_VM_DISK_NAME - virtualMachineDiskRef: name: ORIGINAL_VM_DISK_NAME
Replace the following:
- NEW_VM_DISK_NAME: the name you give to the new VM disk.
- ORIGINAL_VM_DISK_NAME: the name of the original VM disk.
After you've created the VM and it is running, establish an SSH connection to the VM by following Connect to a VM.
Create a directory and mount the original disk to a mount point. For example,
/mnt/disks/new-disk
.Check through the files and directories in the mount directory using extra space:
cd /mnt/disks/MOUNT_DIR du -hs -- * | sort -rh | head -10
Replace MOUNT_DIR with the name of the directory where you mounted the original disk.
The output is similar to the following:
18G home 1.4G usr 331M var 56M boot 5.8M etc 36K snap 24K tmp 16K lost+found 16K dev 8.0K run
Check through each of the files and directories to verify the amount of space each are using. This example checks the
home
directory as it uses18G
of space.cd home du -hs -- * | sort -rh | head -10
The output is similar to the following:
17G log_file ... 4.0K readme.md 4.0K main.go
The example file
log_file
is a file to clear as it consumes17G
of space, and is not necessary.Delete the files you don't need that consume extra space, or back up the files to the new VM boot disk:
Move the files you want to keep:
mv /mnt/disks/MOUNT_DIR/home/FILENAME/home/backup/
Delete the files consuming extra space:
rm /mnt/disks/MOUNT_DIR/home/FILENAME
Replace FILENAME with the name of the file you want to move or delete.
Log out of the new VM and Stop the VM.
Edit the new VM to remove the original disk from the
spec
field:kubectl --kubeconfig ADMIN_KUBECONFIG \ edit virtualmachine.virtualmachine.gdc.goog -n PROJECT NEW_VM_NAME
Remove the
virtualMachineDiskRef
list that contains the original VM disk name:spec: disks: - autoDelete: true boot: true virtualMachineDiskRef: name: NEW_VM_DISK_NAME - virtualMachineDiskRef: # Remove this list name: ORIGINAL_VM_DISK_NAME # Remove this disk name
Edit the original VM and replace VM_DISK_PLACEHOLDER_NAME you set in step two with the previous name:
... spec: disks: - boot: true virtualMachineDiskRef: name: VM_DISK_PLACEHOLDER_NAME # Replace this name with the previous VM name
Start the original VM. If you've cleared enough space, the VM boots successfully.
If you don't need the new VM, delete the VM:
kubectl --kubeconfig ADMIN_KUBECONFIG \ delete virtualmachine.virtualmachine.gdc.goog -n PROJECT NEW_VM_NAME
Provision a virtual machine
This section describes how to troubleshoot issues that might occur while provisioning a new virtual machine (VM) in Google Distributed Cloud (GDC) air-gapped appliance.
The Application Operator (AO) must run all commands against the default user cluster.
Unable to create disk
If a PersistentVolumeClaim
(PVC) is in a Pending
state, review the following
alternatives to resolve the state:
The storage class does not support creating a PVC with the
ReadWriteMany
access mode:Update the
spec.dataVolumeTemplate.spec.pvc.storageClassName
value of the virtual machine with a storage class that supports aReadWriteMany
access mode and uses a Container Storage Interface (CSI) driver as its storage provisioner.If no other storage classes on the cluster can provide the
ReadWriteMany
capability, update thespec.dataVolumeTemplate.spec.pvc.accessMode
value to include theReadWriteOnce
access mode.
The CSI driver is unable to provision a
PersistentVolume
:Check for an error message:
kubectl describe pvc VM_NAME-boot-dv -n NAMESPACE_NAME
Replace the following variables:
VM_NAME
: The name of the virtual machine.NAMESPACE_NAME
: The name of the namespace.
Configure the driver to resolve the error. To ensure that the
PersistentVolume
provisioning works, create a test PVC in a newspec
with a different name than the one specified in thedataVolumeTemplate.spec.pvc
:cat <<EOF | kubectl apply - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc namespace: NAMESPACE_NAME spec: storageClassName: standard-rwx accessModes: - ReadWriteMany resources: requests: storage: 10Gi EOF
After the
PersistentVolume
object provisioning is successful, delete the test PVC after verification:kubectl delete pvc test-pvc -n NAMESPACE_NAME
Unable to create a virtual machine
If the virtual machine resource is applied but does not get to a Running
state, follow these steps:
Review the virtual machine logs:
kubectl get vm VM_NAME -n NAMESPACE_NAME
Check the corresponding Pod status of the virtual machine:
kubectl get pod -l kubevirt.io/vm=VM_NAME
The output shows a Pod status. The possible options are as follows:
The ContainerCreating
state
If the Pod is in the ContainerCreating
state, follow these steps:
Get additional details about the Pod's state:
kubectl get pod -l kubevirt.io/vm=VM_NAME
If the volumes are unmounted, ensure all the volumes specified in the
spec.volumes
field are successfully mounted. If the volume is a disk, check the disk status.The
spec.accessCredentials
field specifies a value to mount a SSH public key. Ensure that the secret is created in the same namespace as the virtual machine.
If there are not enough resources on the cluster to create the Pod, follow these steps:
If the cluster does not have enough compute resources to schedule the virtual machine Pod, remove other unwanted Pods to help release resources.
Reduce the
spec.domain.resources.requests.cpu
andspec.domain.resources.requests.memory
values of the virtual machine.
The Error
or CrashLoopBackoff
state
To resolve Pods in Error
or CrashLoopBackoff
states, retrieve logs from the
virtual machine compute Pod:
kubectl logs -l kubevirt.io/vm=VM_NAME -c compute
The Running
state and virtual machine failure
If the Pod is in the Running
state but the virtual machine itself fails,
follow these steps:
View the logs from the virtual machine log Pod:
kubectl logs -l kubevirt.io/vm=VM_NAME -c log
If the log shows errors in the virtual machine startup, check the correct boot device of the virtual machine. Set the
spec.domain.devices.disks.bootOrder
value of the primary boot disk with the value of1
. Use the following example as a reference:… spec: domain: devices: disks: - bootOrder: 1 disk: bus: virtio name: VM_NAME-boot-dv …
To troubleshoot configuration issues with the virtual machine image, create another virtual machine with a different image.