Dataproc job has failed with an error

Problem

There have been no code changes to our pipelines however they started to fail with the following exception when run on ephemeral clusters.

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Lorg/codehaus/jackson/JsonNode;)V

Environment

  1. Any Dataproc version.
  2. Search for the exception in Cloud Logging.

    resource.type="cloud_dataproc_cluster"
    
    "java.lang.NoSuchMethodError"

Solution

  1. Compare the version of the Dataproc cluster in which the jobs run successfully with the version of the cluster in which they are failing. The version of the Dataproc cluster can be found on the Cluster details page under the Configuration tab or using the following filter in Cloud Logging.

    resource.type="cloud_dataproc_cluster"
    
    protoPayload.request.cluster.clusterName="[cluster-name]"
    
    protoPayload.methodName="google.cloud.dataproc.v1.ClusterController.CreateCluster"
  2. If they are different, create a new cluster using the exact version of Dataproc in which the jobs run successfully and run the failed jobs again.

Cause

Different versions of Dataproc run different versions of open source components that can conflict with the version used by the application code. Please refer to versioning for more details.