クラスタの作成

Dataproc では、Apache Log4j セキュリティ脆弱性の影響を受ける 1.3.95、1.4.77、1.5.53、2.0.27 より前のイメージバージョンでクラスタを作成できません。また、Dataproc イメージバージョン 0.x、1.0.x、1.1.x、1.2.x のクラスタも作成できません。可能であれば、最新のサブマイナーイメージバージョンを使用して Dataproc クラスタを作成することをおすすめします。

イメージのバージョン	log4j バージョン	カスタマーガイダンス
2.0.29、1.5.55、1.4.79 以降	log4j.2.17.1	推奨
2.0.28、1.5.54、1.4.78	log4j.2.17.0	推奨
2.0.27、1.5.53、1.4.77	log4j.2.16.0	強く推奨
2.0.26、1.5.52、1.4.76 以前	旧バージョン	使用を停止

特定のイメージと log4j の更新情報については、Dataproc のリリースノートをご覧ください。

Dataproc クラスタを作成する

要件:

名前: クラスタ名は小文字で始まり、最大 51 の小文字、数字、ハイフンで構成します。末尾にハイフンは置けません。
クラスタリージョン: クラスタの Compute Engine リージョン（us-east1 や europe-west1 など）を指定して、リージョン内の Cloud Storage に保存されている VM インスタンスやクラスタなどのクラスタリソースを分離する必要があります。
- リージョンエンドポイントの詳細については、リージョンエンドポイントをご覧ください。
- リージョンの選択については、利用可能なリージョンとゾーンをご覧ください。gcloud compute regions list コマンドを実行して、利用可能なリージョンのリストを表示することもできます。
接続: Dataproc クラスタ内の Compute Engine 仮想マシンインスタンス（VM）は、マスター VM とワーカー VM で構成され、完全な内部 IP ネットワーク相互接続が必要です。この接続は、default VPC ネットワークによって提供されます（Dataproc クラスタネットワークの構成をご覧ください）。

gcloud

コマンドラインで Dataproc クラスタを作成するには、ターミナルウィンドウまたは Cloud Shell で、gcloud dataproc clusters create コマンドをローカルで実行します。

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION

コマンドを実行すると、デフォルトの Dataproc サービス設定でクラスタが作成されます。デフォルトのサービス設定では、マスター仮想マシンインスタンスとワーカー仮想マシンインスタンス、ディスクのサイズとタイプ、ネットワークタイプ、クラスタがデプロイされるリージョンとゾーン、その他のクラスタ設定が指定されます。コマンドラインフラグを使用したクラスタ設定のカスタマイズについては、gcloud dataproc clusters create コマンドをご覧ください。

YAML ファイルを使用してクラスタを作成する

次の gcloud コマンドを実行して、既存の Dataproc クラスタの構成を cluster.yaml ファイルにエクスポートします。
```
gcloud dataproc clusters export EXISTING_CLUSTER_NAME \
    --region=REGION \
    --destination=cluster.yaml
```

YAML ファイル構成をインポートして新しいクラスタを作成します。

gcloud dataproc clusters import NEW_CLUSTER_NAME \
    --region=REGION \
    --source=cluster.yaml

注: エクスポートのオペレーション中に、クラスタ固有の項目（クラスタ名など）、出力専用項目、自動的に適用されたラベルはフィルタされます。これらの項目は、クラスタ作成のためにインポートした YAML ファイルでは許可されません。

注: Dataproc Google Cloud コンソールの [クラスタの作成] ページの左側にあるパネルの下部の [同等の REST] または [同等のコマンドライン] リンクをクリックすると、コンソールにより同等の API REST 要求または gcloud ツールコマンドが作成され、コードまたはコマンドラインからクラスタを作成するために使用できます。

REST

このセクションでは、必須の値とデフォルト構成（1 つのマスター、2 つのワーカー）でクラスタを作成する方法を説明します。

リクエストのデータを使用する前に、次のように置き換えます。

CLUSTER_NAME: クラスタ名
PROJECT: Google Cloud プロジェクト ID
REGION: クラスタを作成する利用可能な Compute Engine リージョン
ZONE: クラスタを作成する選択したリージョン内のゾーン（省略可）。

HTTP メソッドと URL:

POST https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters

リクエストの本文（JSON）:

{
  "project_id":"PROJECT",
  "cluster_name":"CLUSTER_NAME",
  "config":{
    "master_config":{
      "num_instances":1,
      "machine_type_uri":"n1-standard-2",
      "image_uri":""
    },
    "softwareConfig": {
      "imageVersion": "",
      "properties": {},
      "optionalComponents": []
    },
    "worker_config":{
      "num_instances":2,
      "machine_type_uri":"n1-standard-2",
      "image_uri":""
    },
    "gce_cluster_config":{
      "zone_uri":"ZONE"
    }
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters"

PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
"name": "projects/PROJECT/regions/REGION/operations/b5706e31......",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.dataproc.v1.ClusterOperationMetadata",
    "clusterName": "CLUSTER_NAME",
    "clusterUuid": "5fe882b2-...",
    "status": {
      "state": "PENDING",
      "innerState": "PENDING",
      "stateStartTime": "2019-11-21T00:37:56.220Z"
    },
    "operationType": "CREATE",
    "description": "Create cluster with 2 workers",
    "warnings": [
      "For PD-Standard without local SSDs, we strongly recommend provisioning 1TB ...""
    ]
  }
}

注: Dataproc Google Cloud コンソールの [クラスタの作成] ページで、左側のパネルの下部にある [同等の REST] または [同等のコマンドライン] リンクをクリックすると、コンソールにより同等の API REST 要求または gcloud ツールコマンドが作成され、コードまたはコマンドラインからクラスタを作成する際に使用できます。

コンソール

ブラウザの Google Cloud コンソールで Dataproc の [クラスタの作成] ページを開き、[Compute Engine で Dataproc クラスタを作成する] ページの [Compute Engine 上のクラスタ] 行の [作成] をクリックします。デフォルト値がフィールドに入力されている [クラスタの設定] パネルが選択されています。各パネルを選択し、デフォルト値を確認するか、変更してクラスタをカスタマイズします。

[作成] をクリックして、クラスタを作成します。クラスタ名が [クラスタ] ページに表示され、クラスタがプロビジョニングされると、そのステータスは [実行中] に更新されます。クラスタ名をクリックするとクラスタ詳細ページが開き、クラスタのジョブ、インスタンス、構成設定を確認して、クラスタで実行されているウェブインターフェースに接続できます。

Go

クライアントライブラリをインストールします。

アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

import (
	"context"
	"fmt"
	"io"

	dataproc "cloud.google.com/go/dataproc/apiv1"
	"cloud.google.com/go/dataproc/apiv1/dataprocpb"
	"google.golang.org/api/option"
)

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + "-dataproc.googleapis.com:443"
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %w", err)
	}
	defer clusterClient.Close()

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-2",
				},
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-2",
				},
			},
		},
	}

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %w", err)
	}

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil
}

Java

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Cluster;
import com.google.cloud.dataproc.v1.ClusterConfig;
import com.google.cloud.dataproc.v1.ClusterControllerClient;
import com.google.cloud.dataproc.v1.ClusterControllerSettings;
import com.google.cloud.dataproc.v1.ClusterOperationMetadata;
import com.google.cloud.dataproc.v1.InstanceGroupConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);
  }

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =
        ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(1)
              .build();
      InstanceGroupConfig workerConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(2)
              .build();
      ClusterConfig clusterConfig =
          ClusterConfig.newBuilder()
              .setMasterConfig(masterConfig)
              .setWorkerConfig(workerConfig)
              .build();
      // Create the cluster object with the desired cluster config.
      Cluster cluster =
          Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));
    }
  }
}

Node.js

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// region = 'YOUR_CLUSTER_REGION'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}-dataproc.googleapis.com`,
  projectId: projectId,
});

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-2',
        },
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-2',
        },
      },
    },
  };

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);

Python

クライアントライブラリをインストールします。

アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
    using the Python client library.

    Args:
        project_id (string): Project to use for creating resources.
        region (string): Region where the resources should live.
        cluster_name (string): Name to use for creating a cluster.
    """

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
    )

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},
        },
    }

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    )
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")