Invalid credentials found while running a job

Problem

You notice a spark or MR job starts to fail indicating No valid credentials are found with the following error message:

21/07/14 07:30:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]


Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:

Failed to find any Kerberos tgt)]; Host Details : local host is: "<host_name/IP>"; destination host is: "<fqdn>":8020;

Environment

  • Dataproc cluster
  • Kerberos enabled

Solution

  1. Regenerate the tickets after checking their validity to confirm the expiration. Try the commands below to connect to the primary or worker node through SSH:
    1. To check the ticket validity run klist command 
      $ sudo klist
      • All the keytabs are mostly located in /etc/security/keytab on every node.
      • This command will show the timestamp of the expiry of the ticket.
      • This command checks for a credentials cache. If no credentials are cached, the ticket is expired.
    2. You can get all principals from any primary VM using the following command: 
      $ sudo kadmin.local -q "list_principals
    3. Regenerate the default dataproc principal's ticket if klist command indicates expired tickets.  To get a new ticket, run the kinit command and either specify a keytab file that contains credentials, or enter the password for your principal. 
      $ sudo kinit -k -t /etc/security/keytab/dataproc.service.keytab dataproc/<host>@<realm>
      • kinit is used to obtain and cache Kerberos ticket-granting tickets.
    4. Confirm that the credentials are cached using the same $ klist command and check fields Valid startingExpires, and Service principal.

Cause

This issue can occur if there is no valid ticket available or the ticket has expired in the nodes and is not regenerated.