Dataproc Client Libraries

This page shows how to get started with the Cloud Client Libraries for the Dataproc API. However, we recommend using the older Google APIs Client Libraries if running on Google App Engine standard environment. Read more about the client libraries for Cloud APIs in Client Libraries Explained.

Dataproc Cloud Client Libraries may be in alpha or beta stage. See the library reference for details.

Installing the client library

C#

For more information, see Setting Up a C# Development Environment.

Also see Google.Cloud.Dataproc.V1 Installation

Go

For more information, see Setting Up a Go Development Environment.

go get -u cloud.google.com/go/dataproc/apiv1

For more information, see Install the Cloud Client Libraries for Go.

Java

For more information, see Setting Up a Java Development Environment.

If you are using Maven, add this to your pom.xml file:

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-dataproc</artifactId>
    <version>insert dataproc-library-version here</version>
</dependency>

If you are using Gradle, add this to your dependencies:

compile group: 'com.google.cloud', name: 'google-cloud-dataproc', version: 'insert dataproc-library-version here'

Node.js

For more information, see Setting Up a Node.js Development Environment.

npm install --save @google-cloud/dataproc

PHP

For more information, see Using PHP on Google Cloud.

composer require google/cloud

Python

For more information, see Setting Up a Python Development Environment.

pip install --upgrade google-cloud-dataproc

Ruby

For more information, see Setting Up a Ruby Development Environment.

gem install google-cloud-dataproc

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable. Complete the following steps to set up authentication. For other ways to authenticate, see the GCP authentication documentation.

Cloud Console

  1. Dans Cloud Console, accédez à la page Créer une clé de compte de service.

    Accéder à la page "Créer une clé de compte de service"
  2. Dans la liste Compte de service, sélectionnez Nouveau compte de service.
  3. Dans le champ Nom du compte de service, saisissez un nom.
  4. Dans la liste Rôle, sélectionnez Projet > Propriétaire

  5. Cliquez sur Créer. Un fichier JSON contenant votre clé est téléchargé sur votre ordinateur.

Ligne de commande

Vous pouvez exécuter les commandes suivantes à l'aide du SDK Cloud sur votre ordinateur local, ou dans Cloud Shell.

  1. Créez le compte de service. Remplacez NAME par le nom que vous souhaitez donner au compte de service.

    gcloud iam service-accounts create NAME
  2. Accordez des autorisations au compte de service. Remplacez PROJECT_ID par votre ID de projet.

    gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:NAME@PROJECT_ID.iam.gserviceaccount.com" --role="roles/owner"
  3. Générez le fichier de clé. Remplacez FILE_NAME par le nom du fichier de clé.

    gcloud iam service-accounts keys create FILE_NAME.json --iam-account=NAME@PROJECT_ID.iam.gserviceaccount.com

Fournissez des identifiants d'authentification au code de votre application en définissant la variable d'environnement GOOGLE_APPLICATION_CREDENTIALS. Remplacez [PATH] par le chemin du fichier JSON contenant la clé de votre compte de service. Cette variable ne s'applique qu'à la session d'interface système actuelle. Par conséquent, si vous ouvrez une nouvelle session, vous devez la définir à nouveau.

Linux ou macOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

Exemple :

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/my-key.json"

Windows

Avec PowerShell :

$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

Exemple :

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\my-key.json"

Avec l'invite de commande :

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

Using the client library

The following example shows how to use the client library.

Go

Before trying this sample, follow the Go setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	dataproc "cloud.google.com/go/dataproc/apiv1"
	"google.golang.org/api/option"
	dataprocpb "google.golang.org/genproto/googleapis/cloud/dataproc/v1"
)

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + "-dataproc.googleapis.com:443"
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %v", err)
	}

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-1",
				},
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-1",
				},
			},
		},
	}

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %v", err)
	}

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Java API reference documentation.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Cluster;
import com.google.cloud.dataproc.v1.ClusterConfig;
import com.google.cloud.dataproc.v1.ClusterControllerClient;
import com.google.cloud.dataproc.v1.ClusterControllerSettings;
import com.google.cloud.dataproc.v1.ClusterOperationMetadata;
import com.google.cloud.dataproc.v1.InstanceGroupConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);
  }

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =
        ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-1")
              .setNumInstances(1)
              .build();
      InstanceGroupConfig workerConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-1")
              .setNumInstances(2)
              .build();
      ClusterConfig clusterConfig =
          ClusterConfig.newBuilder()
              .setMasterConfig(masterConfig)
              .setWorkerConfig(workerConfig)
              .build();
      // Create the cluster object with the desired cluster config.
      Cluster cluster =
          Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Node.js API reference documentation.

.
const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// region = 'YOUR_CLUSTER_REGION'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}-dataproc.googleapis.com`,
  projectId: projectId,
});

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-1',
        },
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-1',
        },
      },
    },
  };

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);

Python

Before trying this sample, follow the Python setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Python API reference documentation.

from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
       using the Python client library.

       Args:
           project_id (string): Project to use for creating resources.
           region (string): Region where the resources should live.
           cluster_name (string): Name to use for creating a cluster.
    """

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
    )

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
        },
    }

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    )
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")

Additional resources