Troubleshooting SSH

This document describes common errors that you may run into when connecting to Linux virtual machine (VM) instances using SSH, ways to resolve errors, and methods for diagnosing failed SSH connections.

Common SSH errors

The following are examples of common errors you might encounter when you use SSH to connect to Compute Engine VMs.

Permission denied

The following error might occur when you connect to your VM:

USERNAME@VM_EXTERNAL_IP: Permission denied (publickey).

This error can occur for several reasons. The following are some of the most common causes of this error:

  • You used an SSH key stored in metadata to connect to a VM that has OS Login enabled. If OS Login is enabled on your project, your VM doesn't accept SSH keys that are stored in metadata.

    To resolve this issue, try one of the following:

  • You used an SSH key stored in an OS Login profile to connect to a VM that doesn't have OS Login enabled. If you disable OS Login, your VM doesn't accept SSH keys that were stored in your OS Login profile.

    To resolve this issue, try one of the following:

  • Your key expired and Compute Engine deleted your ~/.ssh/authorized_keys file. If you manually added SSH keys to your VM and then connected to your VM using the Google Cloud Console, Compute Engine created a new key pair for your connection. After the new key pair expired, Compute Engine deleted your ~/.ssh/authorized_keys file in the VM, which included your manually added SSH key.

    To resolve this issue, try one of the following:

  • You connected using a third-party tool and your SSH command is misconfigured. If you connect using the ssh command but don't specify a path to your private key or you specify an incorrect path to your private key, your VM refuses your connection.

    To resolve this issue, try one of the following:

    • Run the following command:
      ssh -i PATH_TO_PRIVATE_KEY USERNAME@EXTERNAL_IP
      

      Replace the following:
      • PATH_TO_PRIVATE_KEY: the path to your private SSH key file.
      • USERNAME: the username of the user connecting to the instance. If you manage your SSH keys in metadata, the username is what you specified when you created the SSH key. For OS Login accounts, the username is defined in your Google profile.
      • EXTERNAL_IP: The external IP address for your VM.
    • Connect to your VM using the Google Cloud Console or the gcloud command-line tool. When you use these tools to connect, Compute Engine manages key creation for you. For more information, see Connecting to VMs.
  • Your VM's guest environment is not running. If this is the first time that you are connecting to your VM and the guest environment is not running, then the VM might refuse your SSH connection request".

    To resolve this issue, do the following:

    1. Restart the VM.
    2. In the Cloud Console, inspect the system startup logs in the serial port output to determine if the guest environment is running. For more information, see Validating the guest environment.
    3. If the guest environment is not running, manually install the guest environment by cloning VM's boot disk and using a startup script.
  • The sshd daemon isn't running or isn't configured properly. The sshd daemon enables SSH connections. If it's misconfigured or not running, you can't connect to your VM.

    To resolve this issue, try the following:

    • Review the user guide for your operating system to ensure that your sshd_config is set up correctly.
    • If you previously modified the folder permissions on your VM, change them back to the defaults:

      • 700 on the .ssh directory
      • 644 on the public key, for example id_rsa.pub
      • 600 on the private key, for example id_rsa

      Perform the following steps to modify the folder permissions:

      1. Connect to your VM as the root user using the serial console. The following startup script modifies the root password:

        usermod -p $(echo "PASSWORD" | openssl passwd -1 -stdin) root
        Replace PASSWORD with a password of your choice.

      2. Once you're connected to your VM, modify the folder permissions:

        chmod 700 /home/USERNAME/.ssh;chmod 644 /home/USERNAME/.ssh/id_rsa.pub;chmod 600 /home/USERNAME/.ssh/id_rsa

        Replace USERNAME with the username for which you want to modify folder permissions.

      3. When you're done modifying permissions, disable the root account login:

        sudo passwd -l root

    Connection failed

    The following errors might occur when you connect to your VM from the Google Cloud Console or the gcloud tool:

    • The Cloud Console:

      Connection Failed
      
      We are unable to connect to the VM on port 22.
      
    • The gcloud tool:

      ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
      

    These errors can occur for several reasons. The following are some of the most common causes of the errors:

    • The VM is booting up and sshd is not running yet. You can't connect to a VM before it is running.

      To resolve this issue, wait until the VM has finished booting and try to connect again.

    • The firewall rule allowing SSH is missing or misconfigured. By default, Compute Engine VMs allow SSH access on port 22. If the default-allow-ssh rule is missing or misconfigured, you won't be able to connect to VMs.

      To resolve this issue, Check your firewall rules and re-add or reconfigure default-allow-ssh.

    • sshd is running on a custom port. If you configured sshd to run on a port other than port 22, you won't be able to connect to your VM.

      To resolve this issue, create a custom firewall rule allowing tcp traffic on the port that your sshd is running on using the following command:

      gcloud compute firewall-rules create FIREWALL_NAME \
        --allow tcp:PORT_NUMBER
      

      For more information about creating custom firewall rules, see Creating firewall rules.

    • Your custom SSH firewall rule doesn't allow traffic from Google services. SSH connections from the Cloud Console are refused if custom firewall rules do not allow connections from Google's IP address range.

      To resolve this issue, do the following:

      1. Gather Google's IP address ranges by running the following command:

        dig +qr +short txt `dig +short TXT _spf.google.com | grep -oE 'include:\S*' | cut -d':' -f2 | xargs` | grep -oE 'ip[46]:\S*' | sort | uniq
        
      2. Update your custom firewall rule to allow traffic from Google IP addresses. For more information, see Updating firewall rules.

    • The SSH connection failed after you upgraded the VM's kernel. A VM might experience a kernel panic after a kernel update, causing the VM to become inaccessible.

      To resolve this issue, do the following:

      1. Mount the disk to another VM.
      2. Update the grub.cfg file to use the previous version of the kernel.
      3. Attach the disk to the unresponsive VM.
      4. Verify that the status of the VM is RUNNING by using the gcloud compute instances describe command.
      5. Reinstall the kernel.
      6. Restart the VM.

      Alternatively, if you created a snapshot of the boot disk before upgrading the VM, use the snapshot to create a VM.

    • The sshd daemon isn't running or isn't configured properly. The sshd daemon enables SSH connections. If it's misconfigured or not running, you can't connect to your VM.

      To resolve this issue, try the following:

      • Review the user guide for your operating system to ensure that your sshd_config is set up correctly.
      • If you previously modified the folder permissions on your VM, change them back to the defaults:

        • 700 on the .ssh directory
        • 644 on the public key, for example id_rsa.pub
        • 600 on the private key, for example id_rsa

        Perform the following steps to modify the folder permissions:

        1. Connect to your VM as the root user using the serial console. The following startup script modifies the root password:

          usermod -p $(echo "PASSWORD" | openssl passwd -1 -stdin) root
          Replace PASSWORD with a password of your choice.

        2. Once you're connected to your VM, modify the folder permissions:

          chmod 700 /home/USERNAME/.ssh;chmod 644 /home/USERNAME/.ssh/id_rsa.pub;chmod 600 /home/USERNAME/.ssh/id_rsa

          Replace USERNAME with the username for which you want to modify folder permissions.

        3. When you're done modifying permissions, disable the root account login:

          sudo passwd -l root

        4. The VM isn't booting and you can't connect using SSH or the serial console. If the VM is inaccessible, then your OS might be corrupted. If the boot disk doesn't boot, you can diagnose the issue. If you want to recover the corrupted VM and retrieve data, see Recovering a corrupted VM or a full boot disk.

        5. The VM is booting in maintenance mode. When booting in maintenance mode, the VM doesn't accept SSH connections, but you can connect to the VM's serial console and log in as the root user.

          To resolve this issue, do the following:

          1. If you haven't set a root password for the VM, use a metadata startup script to run the following command during boot:

            echo "NEW_PASSWORD" | chpasswd

            Replace NEW_PASSWORD with a password of your choice.

          2. Restart the VM.

          3. Connect to the VM's serial console and log in as the root user.

      Failed to connect to backend

      The following errors might occur when you connect to your VM from the Google Cloud Console or the gcloud tool:

      • The Cloud Console:

        -- Connection via Cloud Identity-Aware Proxy Failed
        
        -- Code: 4003
        
        -- Reason: failed to connect to backend
        
      • The gcloud tool:

        ERROR: (gcloud.compute.start-iap-tunnel) Error while connecting [4003: u'failed to connect to backend'].
        

      These errors occur when you try to use SSH to connect to a VM that doesn't have a public IP address and for which you haven't configured Identity-Aware Proxy on port 22.

      To resolve this issue Create a firewall rule on port 22 that allows ingress traffic from Identity-Aware Proxy.

      Diagnosing failed SSH connections

      The following sections describe steps you can take to diagnose the cause of failed SSH connections and the steps you can take to fix your connections.

      Before you diagnose failed SSH connections, complete the following steps:

      Test connectivity

      You might not be able to SSH to a VM instance because of connectivity issues linked to firewalls, network connection, or the user account. Follow the steps in this section to identify any connectivity issues.

      Check your firewall rules

      Compute Engine provisions each project with a default set of firewall rules that permit SSH traffic. If you are unable to access your instance, use the gcloud compute command-line tool to check your list of firewalls and ensure that the default-allow-ssh rule is present.

      On your local workstation, run the following command:

      gcloud compute firewall-rules list
      

      If the firewall rule is missing, add it back:

      gcloud compute firewall-rules create default-allow-ssh \
          --allow tcp:22
      

      To view all data associated with the default-allow-ssh firewall rule in your project, use the gcloud compute firewall-rules describe command:

      gcloud compute firewall-rules describe default-allow-ssh \
          --project=project-id
      

      For more information about firewall rules, see Firewall rules in Google Cloud.

      Test the network connection

      To determine whether the network connection is working, test the TCP handshake:

      1. Obtain the external natIP for your VM:

        gcloud compute instances describe VM_NAME \
            --format='get(networkInterfaces[0].accessConfigs[0].natIP)'
        

        Replace VM_NAME with the name of the VM you can't connect to.

      2. Test the network connection to your VM from your workstation:

        Linux, Windows 2019, and macOS

        curl -vso /dev/null --connect-timeout 5 EXTERNAL_IP:PORT_NUMBER
        

        Replace the following:

        • EXTERNAL_IP: the external IP address you obtained in the previous step
        • PORT_NUMBER: the port number

        If the TCP handshake is successful, the output is similar to the following:

        Expire in 0 ms for 6 (transfer 0x558b3289ffb0)
        Expire in 5000 ms for 2 (transfer 0x558b3289ffb0)
        Trying 192.168.0.4...
        TCP_NODELAY set
        Expire in 200 ms for 4 (transfer 0x558b3289ffb0)
        Connected to 192.168.0.4 (192.168.0.4) port 443 (#0)
        > GET / HTTP/1.1
        > Host: 192.168.0.4:443
        > User-Agent: curl/7.64.0
        > Accept: */*
        >
        Empty reply from server
        Connection #0 to host 192.168.0.4 left intact
        

        The Connected to line indicates a successful TCP handshake.

        Windows 2012 and 2016

        PS C:> New-Object System.Net.Sockets.TcpClient('EXTERNAL_IP',PORT_NUMBER)
        

        Replace the following:

        • EXTERNAL_IP: the external IP you obtained in the previous step
        • PORT_NUMBER: the port number

        If the TCP handshake is successful, the output is similar to the following:

        Available           : 0
        Client              : System.Net.Sockets.Socket
        Connected           : True
        ExclusiveAddressUse : False
        ReceiveBufferSize   : 131072
        SendBufferSize      : 131072
        ReceiveTimeout      : 0
        SendTimeout         : 0
        LingerState         : System.Net.Sockets.LingerOption
        NoDelay             : False
        

        The Connected: True line indicates a successful TCP handshake.

      If the TCP handshake completes successfully, a software firewall rule is not blocking the connection, the OS is correctly forwarding packets, and a server is listening on the destination port. If the TCP handshake completes successfully but the VM doesn't accept SSH connections, the issue might be with that the sshd daemon is misconfigured or not running properly. Review the user guide for your operating system to ensure that your sshd_config is set up correctly.

      To run connectivity tests for analyzing the VPC network path configuration between two VMs and check whether the programmed configuration should allow the traffic, see Check for misconfigured firewall rules in Google Cloud.

      Connect as a different user

      The issue that prevents you from logging in might be limited to your user account. For example, the permissions on the ~/.ssh/authorized_keys file on the instance might not be set correctly for the user.

      Try logging in as a different user with the gcloud tool by specifying ANOTHER_USERNAME with the SSH request. The gcloud tool updates the project's metadata to add the new user and allow SSH access.

      gcloud compute ssh ANOTHER_USERNAME@VM_NAME
      

      Replace the following:

      • ANOTHER_USERNAME is a username other than your own username
      • VM_NAME is the name of the VM you want to connect to

      Debug the issue in the serial console

      We recommend that you review the logs from the serial console for connection errors. You can access the serial console from your local workstation by using a browser.

      Enable read/write access to an instance's serial console, so you can log into the console and troubleshoot problems with the instance. This approach is useful when you cannot log in with SSH, or if the instance has no connection to the network. The serial console remains accessible in both of these situations.

      To learn how to enable interactive access and connect to an instance's serial console, read Interacting with the serial console.

      Inspect the VM instance without shutting it down

      You might have an instance that you cannot connect to that continues to correctly serve production traffic. In this case, you might want to inspect the disk without interrupting the instance.

      To inspect and troubleshoot the disk:

      1. Back up your boot disk by creating a snapshot of the disk.
      2. Create a regular persistent disk from that snapshot.
      3. Create a temporary instance.
      4. Attach and mount the regular persistent disk to your new temporary instance.

      This procedure creates an isolated network that only allows SSH connections. This setup prevents any unintended consequences of the cloned instance interfering with your production services.

      1. Create a new VPC network to host your cloned instance:

        gcloud compute networks create debug-network
        

        Replace NETWORK_NAME with the name you want to call your new network.

      2. Add a firewall rule to allow SSH connections to the network:

        gcloud compute firewall-rules create debug-network-allow-ssh \
           --network debug-network \
           --allow tcp:22
        
      3. Create a snapshot of the boot disk.

        gcloud compute disks snapshot BOOT_DISK_NAME \
           --snapshot-names debug-disk-snapshot
        

        Replace BOOT_DISK_NAME with the name of the boot disk.

      4. Create a new disk with the snapshot you just created:

        gcloud compute disks create example-disk-debugging \
           --source-snapshot debug-disk-snapshot
        
      5. Create a new debugging instance without an external IP address:

        gcloud compute instances create debugger \
           --network debug-network \
           --no-address
        
      6. Attach the debugging disk to the instance:

        gcloud compute instances attach-disk debugger \
           --disk example-disk-debugging
        
      7. Follow the instructions to connect to an instance without an external IP address.

      8. After you have logged into the debugger instance, troubleshoot the instance. For example, you can look at the instance logs:

        sudo su -
        
        mkdir /mnt/VM_NAME
        
        mount /dev/disk/by-id/scsi-0Google_PersistentDisk_example-disk-debugging /mnt/VM_NAME
        
        cd /mnt/VM_NAME/var/log
        
        # Identify the issue preventing ssh from working
        ls
        

        Replace VM_NAME with the name of the VM you can't connect to.

      Use a startup script

      If none of the preceding helped, you can create a startup script to collect information right after the instance starts. Follow the instructions for running a startup script.

      Afterward, you also need to reset your instance before the metadata takes effect by using gcloud compute instances reset.

      Alternatively, you can also recreate your instance by running a diagnostic startup script:

      1. Run gcloud compute instances delete with the --keep-disks flag.

        gcloud compute instances delete VM_NAME \
           --keep-disks boot
        

        Replace VM_NAME with the name of the VM you can't connect to.

      2. Add a new instance with the same disk and specify your startup script.

        gcloud compute instances create NEW_VM_NAME \
           --disk name=BOOT_DISK_NAME,boot=yes \
           --metadata startup-script-url URL
        

        Replace the following:

        • NEW_VM_NAME is the name of the new VM you're creating
        • BOOT_DISK_NAME is the name of the boot disk from the VM you can't connect to
        • URL is the Cloud Storage URL to the script, in either gs://BUCKET/FILE or https://storage.googleapis.com/BUCKET/FILE format.

      Use your disk on a new instance

      If you still need to recover data from your persistent boot disk, you can detach the boot disk and then attach that disk as a secondary disk on a new instance.

      1. Delete the VM you can't connect to and keep its boot disk:

        gcloud compute instances delete VM_NAME \
           --keep-disks=boot 

        Replace VM_NAME with the name of the VM you can't connect to.

      2. Create a new VM with your old VM's boot disk:

        gcloud compute instances create NEW_VM_NAME \
           --disk name=BOOT_DISK_NAME,boot=yes,auto-delete=no 

        Replace the following:

        • NEW_VM_NAME is the name of the new VM you're creating
        • BOOT_DISK_NAME is the name of the boot disk from the VM you can't connect to
      3. Connect to your new VM using SSH:

        gcloud compute ssh NEW_VM_NAME
        

        Replace NEW_VM_NAME with the name of your new VM.

      Check whether or not the VM boot disk is full

      Your VM might become inaccessible if its boot disk is full. This scenario can be difficult to troubleshoot as it's not always obvious when the VM connectivity issue is due to a full boot disk. For more information about this scenario, see Troubleshooting a VM that is inaccessible due to a full boot disk.

      What's Next?