Troubleshooting

Learn about troubleshooting steps that you might find helpful if you run into problems using Migrate for Anthos.

I cannot install Migrate for Anthos

Application took too long to deploy

If you see the error Application took too long to deploy when deploying Migrate for Anthos from Google Cloud Marketplace, re-create your cluster manually.

After you have re-created your cluster, re-install Migrate for Anthos.

My application won't start

What's happening to my app? Where can I see logs?

See Monitoring migrated workloads for more information on logging.

My application pods / workloads have a status of "Unschedulable" for more than 10 minutes

In your Migrate for Anthos deployment setup, check that your:

  • Migrate for Compute Engine Manager IP is correct.
  • Migrate for Compute Engine Manager is reachable from the GKE cluster's subnet.
  • Migrate for Compute Engine API password is correct.

My pods do not change from "Pending" status

Pods that do not change state from Pending to Running might not be able to mount a volume.

Remove unavailable volumes

Migrate for Anthos is configured to halt migrations when a mounted volume is not accessible. To continue with a migration, unmount the non-accessible volumes on the source VM and restart your migration.

Unrecognized source UUID

During Compute Engine to GKE migrations, Migrate for Anthos may fail to recognize the UUID of the source VM's disk. You can add it manually:

  1. Load the logs for the pod using kubectl or Stackdriver.
  2. If you see the message [hcrunner] - Failed to find boot partition, continue with the following steps.
    1. Find the UUID for the boot disk printed in one of the messages, which will be a string of hexadecimal values. In the example below, the UUID is e823158e-f290-4f91-9c3d-6f33367ae0da.

      [util] - SHELL OUTPUT: {"name": "/dev/sdb1", "partflags": null, "parttype":
      "0x83", "uuid": "e823158e-f290-4f91-9c3d-6f33367ae0da",
      "fstype": "ext4"}
      
    2. Delete the existing workload using its YAML file.

      kubectl delete -f DEPLOYMENT_YAML
      
    3. Open the YAML file in a text editor.

    4. Find the section named env (create an env section if needed).

    5. Add the following:

            - name: "HC_BOOTDEVICE_UUID"
              value: "[UUID]"
      
  • If you see the message touch: cannot touch '/rootfs/etc/fstab': No such file or directory check the following:

    • Your CSI driver workloads have a status of OK in the GKE console.
    • Your workload is in the same cluster as your Migrate for Anthos deployment.
  • If you see one of the following messages:

    • hcutil.Error: Failed mount -o rw None /rootfs (32) (Output:mount: /rootfs: special device None does not exist.)
    • [hcrunner] - [Errno 30] Read-only file system: '/rootfs/rootdir/etc/dhcp/dhclient-up-hooks

    Delete the workload's failing PersistentVolumeClaim and recreate it.

A device listed in /etc/fstab fails to mount

By default, the system parses /etc/fstab and mounts all of the listed devices to the required mount points. If a device is not recognized or not mounted, the workload pod will not get to a ready state.

For example, consider a source VM in Amazon EC2 that has an ephemeral disk where persistence is not guaranteed. These disks are not streamed to the target, causing the container to fail on mounting it.

If this happens, you might see messages such as the following:

  • Unable to locate resolve [X] fstab entries: …
  • Error: Failed mount -o defaults /dev/mapper/mpathe-part /rootfs/mnt/ephemeral

You can work around this by either setting an environment variable that causes the system to skip mount failures and continue, or by editing /etc/fstab to remove the device mount command.

To set the variable, add an env property to your StatefulSet definition, as follows:

# Omitted code...
  spec:
    containers:
    - name: [APPLICATION_NAME]
      # The image for the Migrate for Anthos system container.
      image: anthos-migrate.gcr.io/v2k-run:v1.0.1
      # Environment variables
      env:
      - name: HC_INIT_SKIP_MOUNT_FAILURES
        value: "true"

Use a variable described in the following table:

Variable Description Allowed values
HC_INIT_SKIP_MOUNT_FAILURES Specify "true" to ignore mount failures if they occur.

Set this variable to "true" to specify that such failures should be skipped. Note that this may result in missed mount points.

String: "true" | "false"; default is "false".

My Migrate for Anthos deployment is unable to attach to storage

If you've deleted a cluster that has containers using Migrate for Anthos CSI for streaming PVCs, the storage claimed by the cluster will become unavailable for a new deployment using the same source VM.

This can happen when you have deleted a cluster without gracefully deleting all of its child resources. After setting up a new cluster to use the same Migrate for Compute Engine manager and VM as source, you might see an error such as the following, in which a Migrate for Anthos deployment could not attach to storage:

Attaching the published storage to 'projects/724375933346/zones/europe-west1-c/gke-stab4m4jju1-pool-yyy' has failed, as it is already attached to 'projects/724375933346/zones/europe-west1-d/gke-stab4m4jju1-pool-zzz'

You can resolve this issue by running a Kubernetes job that cleans up invalid connections between Migrate for Compute Engine and storage.

When using the tool, you specify the VM to delete by using one of the following values identifying the VM to Migrate for Compute Engine:

  • VM_ID -- The VM ID, such as vm-15, or a VM path in form of a/b/vm-name.

To perform the cleanup, create a configuration such as the following with your own values, then apply the configuration to your Migrate for Anthos deployment.

kind: Job
apiVersion: batch/v1
metadata:
 name: vm-cleaner
 namespace: v2k-system # Namespace of the Migrate for Anthos system.
spec:
 template:
   metadata:
     annotations:
       sidecar.istio.io/inject: "false"
   spec:
     restartPolicy: OnFailure
     hostNetwork: true
     containers:
       - name: vm-cleaner
         image: anthos-migrate.gcr.io/vlsdisk-csi-driver:v1.0.1
         imagePullPolicy: Always
         command: [ "/cleaner" ]
         env:
           - name: "VM_ID" # # Specify vm-id, or vm-name in form of a/b/vm-name
             value: "vm-15"
         volumeMounts:
           - name: params
             mountPath: /params
     volumes:
       - name: params
         secret:
           secretName: csi-vlsdisk-env  # StorageClass secret (environment)

Debugging Kubernetes resources

More help is available at the following pages:

I would like personalized support

Paid support is available for customers migrating with Migrate for Anthos. Reach out so we can help.

Providing information to Google Cloud support

The Sysreport provides Migrate for Anthos support with information about your cluster's configuration for faster time to problem resolution.

You can access the script from Cloud Shell.

  1. Open Cloud Shell

Next, run the collect_sysreport.sh script.

/google/migrate/anthos/collect_sysreport.sh [NAMESPACE] [DEPLOYMENT_NAME] [--workloads]

Where:

  • [NAMESPACE] is the namespace where your Migrate for Anthos components were installed.
  • [DEPLOYMENT_NAME] is name given when you created your Migrate for Anthos deployment in the GKE marketplace.
  • --workloads collects additional data from your migrated workloads. See below for more information.

The script creates anthos-migrate-logs.TIMESTAMP.tar.xz, which you provide to Google Cloud support.

By default, the script collects:

  • Logs from the Migrate for Anthos CSI controller and CSI nodes.
  • Syslog from the Migrate for Anthos CSI node hosts.
  • The output of:
    • kubectl cluster-info
    • kubectl get nodes; kubectl describe node
    • kubectl version
    • kubectl top node

With the --workloads flag enabled, for every workload the script collects:

  • The workload's logs.
  • The output of:
    • ps aux
    • netstat -tlnp
    • iptables -t nat -L
    • fstab
    • kubectl get pod
    • kubectl describe pod
    • kubectl top pod --all-namespaces --containers
    • kubectl cluster-info dump
    • kubectl api-resources -o wide
    • kubectl top pod --all-namespaces --containers
    • kubectl api-resources -o wide
    • kubectl get componentstatuses --all-namespaces
    • kubectl get endpoints --all-namespaces
    • kubectl get events --all-namespaces
    • kubectl describe limits --all-namespaces
    • kubectl get namespaces
    • kubectl describe pvc --all-namespaces
    • kubectl describe pv --all-namespaces
    • kubectl describe quota --all-namespaces
    • kubectl describe sa --all-namespaces
    • kubectl describe services --all-namespaces
    • kubectl describe services --all-namespaces
    • kubectl get ingresses --all-namespaces
    • kubectl describe networkpolicies --all-namespaces
    • kubectl get podsecuritypolicies --all-namespaces
    • kubectl get clusterrolebindings --all-namespaces
    • kubectl describe storageclasses --all-namespaces
    • kubectl describe volumeattachments --all-namespaces
Was this page helpful? Let us know how we did:

Send feedback about...

Migrate for Anthos Documentation