Dataproc client libraries

This page shows how to get started with the Cloud Client Libraries for the Dataproc API. However, we recommend using the older Google API Client Libraries if running on Google App Engine standard environment. Read more about the client libraries for Cloud APIs in Client Libraries Explained.

Dataproc Cloud Client Libraries may be in alpha or beta stage. See the library reference for details.

Installing the client library

C#

For more information, see Setting Up a C# Development Environment.

Also see Google.Cloud.Dataproc.V1 Installation

Go

For more information, see Setting Up a Go Development Environment.

go get -u cloud.google.com/go/dataproc/apiv1

For more information, see Install the Cloud Client Libraries for Go.

Java

For more information, see Setting Up a Java Development Environment.

If you are using Maven, add this to your pom.xml file:

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-dataproc</artifactId>
    <version>insert dataproc-library-version here</version>
</dependency>

If you are using Gradle, add this to your dependencies:

compile group: 'com.google.cloud', name: 'google-cloud-dataproc', version: 'insert dataproc-library-version here'

Node.js

For more information, see Setting Up a Node.js Development Environment.

npm install --save @google-cloud/dataproc

PHP

For more information, see Using PHP on Google Cloud.

composer require google/cloud

Python

For more information, see Setting Up a Python Development Environment.

pip install --upgrade google-cloud-dataproc

Ruby

For more information, see Setting Up a Ruby Development Environment.

gem install google-cloud-dataproc

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable. Complete the following steps to set up authentication. For other ways to authenticate, see the GCP authentication documentation.

Cloud Console

Crea una cuenta de servicio:

  1. En Cloud Console, ve a la página Crear cuenta de servicio.

    Ir a Crear cuenta de servicio
  2. Selecciona un proyecto
  3. Ingresa un nombre en el campo Nombre de cuenta de servicio. Cloud Console completa el campo ID de cuenta de servicio con este nombre.

    En el campo Descripción de la cuenta de servicio, ingresa una descripción. Por ejemplo, Service account for quickstart.

  4. Haga clic en Crear.
  5. Haz clic en el campo Seleccionar una función.

    En Acceso rápido, haz clic en Básica y, luego, en Propietario.

  6. Haga clic en Continuar.
  7. Haz clic en Listo para terminar de crear la cuenta de servicio.

    No cierres la ventana del navegador. La usarás en la próxima tarea.

Para crear una clave de cuenta de servicio, haz lo siguiente:

  1. En Cloud Console, haz clic en la dirección de correo electrónico de la cuenta de servicio que creaste.
  2. Haz clic en Claves.
  3. Haz clic en Agregar clave y, luego, en Crear clave nueva.
  4. Haga clic en Crear. Se descargará un archivo de claves JSON a tu computadora.
  5. Haga clic en Cerrar.

Línea de comandos

Puedes ejecutar los siguientes comandos con el SDK de Cloud en tu máquina local o en Cloud Shell.

  1. Crea la cuenta de servicio. Reemplaza NAME por un nombre para la cuenta de servicio.

    gcloud iam service-accounts create NAME
  2. Otorga permisos a la cuenta de servicio. Reemplaza PROJECT_ID por el ID del proyecto.

    gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:NAME@PROJECT_ID.iam.gserviceaccount.com" --role="roles/owner"
  3. Genera el archivo de claves. Reemplaza FILE_NAME por un nombre para el archivo de claves.

    gcloud iam service-accounts keys create FILE_NAME.json --iam-account=NAME@PROJECT_ID.iam.gserviceaccount.com

Configura la variable de entorno GOOGLE_APPLICATION_CREDENTIALS para proporcionar credenciales de autenticación al código de la aplicación. Esta variable solo se aplica a la sesión actual de shell. Por lo tanto, si abres una sesión nueva, deberás volver a configurar la variable.

Linux o macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Reemplaza KEY_PATH por la ruta de acceso del archivo JSON que contiene la clave de tu cuenta de servicio.

Por ejemplo:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Windows

Para PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Reemplaza KEY_PATH por la ruta de acceso del archivo JSON que contiene la clave de tu cuenta de servicio.

Por ejemplo:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

Para el símbolo del sistema:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

Reemplaza KEY_PATH por la ruta de acceso del archivo JSON que contiene la clave de tu cuenta de servicio.

Using the client library

The following example shows how to use the client library.

Go

Before trying this sample, follow the Go setup instructions in the Dataproc quickstart using client libraries. For more information, see the Dataproc Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	dataproc "cloud.google.com/go/dataproc/apiv1"
	"google.golang.org/api/option"
	dataprocpb "google.golang.org/genproto/googleapis/cloud/dataproc/v1"
)

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + "-dataproc.googleapis.com:443"
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %v", err)
	}
	defer clusterClient.Close()

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-2",
				},
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-2",
				},
			},
		},
	}

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %v", err)
	}

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Dataproc quickstart using client libraries. For more information, see the Dataproc Java API reference documentation.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Cluster;
import com.google.cloud.dataproc.v1.ClusterConfig;
import com.google.cloud.dataproc.v1.ClusterControllerClient;
import com.google.cloud.dataproc.v1.ClusterControllerSettings;
import com.google.cloud.dataproc.v1.ClusterOperationMetadata;
import com.google.cloud.dataproc.v1.InstanceGroupConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);
  }

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =
        ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(1)
              .build();
      InstanceGroupConfig workerConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(2)
              .build();
      ClusterConfig clusterConfig =
          ClusterConfig.newBuilder()
              .setMasterConfig(masterConfig)
              .setWorkerConfig(workerConfig)
              .build();
      // Create the cluster object with the desired cluster config.
      Cluster cluster =
          Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Dataproc quickstart using client libraries. For more information, see the Dataproc Node.js API reference documentation.

.
const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// region = 'YOUR_CLUSTER_REGION'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}-dataproc.googleapis.com`,
  projectId: projectId,
});

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-2',
        },
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-2',
        },
      },
    },
  };

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);

Python

Before trying this sample, follow the Python setup instructions in the Dataproc quickstart using client libraries. For more information, see the Dataproc Python API reference documentation.

from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
       using the Python client library.

       Args:
           project_id (string): Project to use for creating resources.
           region (string): Region where the resources should live.
           cluster_name (string): Name to use for creating a cluster.
    """

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
    )

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},
        },
    }

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    )
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")

Additional resources