Dataproc Client Libraries

This page shows how to get started with the Cloud Client Libraries for the Dataproc API. However, we recommend using the older Google APIs Client Libraries if running on Google App Engine standard environment. Read more about the client libraries for Cloud APIs in Client Libraries Explained.

Dataproc Cloud Client Libraries may be in alpha or beta stage. See the library reference for details.

Installing the client library


For more information, see Setting Up a C# Development Environment.

Also see Google.Cloud.Dataproc.V1 Installation


For more information, see Setting Up a Go Development Environment.

go get -u

For more information, see Install the Cloud Client Libraries for Go.


For more information, see Setting Up a Java Development Environment.

If you are using Maven, add this to your pom.xml file:

    <version>insert dataproc-library-version here</version>

If you are using Gradle, add this to your dependencies:

compile group: '', name: 'google-cloud-dataproc', version: 'insert dataproc-library-version here'


For more information, see Setting Up a Node.js Development Environment.

npm install --save @google-cloud/dataproc


For more information, see Using PHP on Google Cloud.

composer require google/cloud


For more information, see Setting Up a Python Development Environment.

pip install --upgrade google-cloud-dataproc


For more information, see Setting Up a Ruby Development Environment.

gem install google-cloud-dataproc

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable. Complete the following steps to set up authentication. For other ways to authenticate, see the GCP authentication documentation.

Cloud Console

서비스 계정을 만듭니다.

  1. Cloud Console에서 서비스 계정 만들기 페이지로 이동합니다.

    서비스 계정 만들기로 이동
  2. 프로젝트를 선택합니다.
  3. 서비스 계정 이름 필드에 이름을 입력합니다. Cloud Console은 이 이름을 기반으로 서비스 계정 ID 필드를 채웁니다.

    서비스 계정 설명 필드에 설명을 입력합니다. 예를 들면 Service account for quickstart입니다.

  4. 만들기를 클릭합니다.
  5. 역할 선택 필드를 클릭합니다.

    빠른 액세스에서 기본을 클릭한 후 소유자를 클릭합니다.

  6. 계속을 클릭합니다.
  7. 완료를 클릭하여 서비스 계정 만들기를 마칩니다.

    브라우저 창을 닫지 마세요. 다음 단계에서 사용합니다.

서비스 계정 키 만들기

  1. Cloud Console에서 만든 서비스 계정의 이메일 주소를 클릭합니다.
  2. 를 클릭합니다.
  3. 키 추가를 클릭한 후 새 키 만들기를 클릭합니다.
  4. 만들기를 클릭합니다. JSON 키 파일이 컴퓨터에 다운로드됩니다.
  5. 닫기를 클릭합니다.


로컬 머신 또는 Cloud Shell에서 Cloud SDK를 사용하여 다음 명령어를 실행할 수 있습니다.

  1. 서비스 계정을 만듭니다. NAME을 서비스 계정 이름으로 바꿉니다.

    gcloud iam service-accounts create NAME
  2. 서비스 계정에 권한을 부여합니다. PROJECT_ID를 프로젝트 ID로 바꿉니다.

    gcloud projects add-iam-policy-binding PROJECT_ID --member="" --role="roles/owner"
  3. 키 파일을 생성합니다. FILE_NAME을 키 파일 이름으로 바꿉니다.

    gcloud iam service-accounts keys create FILE_NAME.json

GOOGLE_APPLICATION_CREDENTIALS 환경 변수를 설정하여 애플리케이션 코드에 사용자 인증 정보를 제공합니다. [PATH]를 서비스 계정 키가 포함된 JSON 파일의 파일 경로로 바꿉니다. 이 변수는 현재 셸 세션에만 적용되므로 새 세션을 연 경우 변수를 다시 설정합니다.

Linux 또는 macOS


예를 들면 다음과 같습니다.

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/my-key.json"




예를 들면 다음과 같습니다.


명령어 프롬프트:


Using the client library

The following example shows how to use the client library.


Before trying this sample, follow the Go setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Go API reference documentation.

import (

	dataproc ""
	dataprocpb ""

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + ""
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %v", err)

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-2",
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-2",

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %v", err)

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %v", err)

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil


Before trying this sample, follow the Java setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Java API reference documentation.

import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
      InstanceGroupConfig workerConfig =
      ClusterConfig clusterConfig =
      // Create the cluster object with the desired cluster config.
      Cluster cluster =

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));


Before trying this sample, follow the Node.js setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Node.js API reference documentation.

const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}`,
  projectId: projectId,

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-2',
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-2',

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);


Before trying this sample, follow the Python setup instructions in the Dataproc Quickstart Using Client Libraries. For more information, see the Dataproc Python API reference documentation.

from import dataproc_v1 as dataproc

def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
       using the Python client library.

           project_id (string): Project to use for creating resources.
           region (string): Region where the resources should live.
           cluster_name (string): Name to use for creating a cluster.

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}"}

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")

Additional resources