Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Crea un clúster de Dataproc con gcloud CLI
En esta página, se muestra cómo usar la herramienta de línea de comandos gcloud de Google Cloud CLI para crear un clúster de Dataproc, ejecutar un trabajo de Apache Spark en el clúster y, luego, modificar la cantidad de trabajadores en el clúster.
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
El resultado del comando confirma la creación del clúster:
Waiting for cluster creation operation...done.
Created [... example-cluster]
Para obtener información sobre cómo seleccionar una región, consulta Regiones y zonas disponibles.
Para ver una lista de las regiones disponibles, puedes ejecutar el comando gcloud compute regions list.
Para obtener información sobre los extremos regionales, consulta Extremos regionales.
Envía un trabajo
Si quieres enviar un trabajo de Spark de muestra que calcule un valor aproximado para pi, ejecuta el siguiente comando:
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-04 (UTC)"],[[["\u003cp\u003eThis guide demonstrates how to create a Dataproc cluster using the \u003ccode\u003egcloud\u003c/code\u003e command-line tool.\u003c/p\u003e\n"],["\u003cp\u003eYou can use the \u003ccode\u003egcloud\u003c/code\u003e command to submit an Apache Spark job to a cluster to execute code, such as a sample job that calculates the value of \u003ccode\u003epi\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eThe number of workers within an existing Dataproc cluster can be adjusted with the \u003ccode\u003egcloud\u003c/code\u003e update command.\u003c/p\u003e\n"],["\u003cp\u003eAfter you are finished with your Dataproc cluster, it can be deleted using the \u003ccode\u003egcloud\u003c/code\u003e command to prevent continued resource usage charges.\u003c/p\u003e\n"]]],[],null,["Create a Dataproc cluster by using the gcloud CLI This page shows you how to use the Google Cloud CLI\n[gcloud](/sdk/gcloud/reference/dataproc) command-line tool to create a\nDataproc cluster, run a [Apache Spark](http://spark.apache.org/) job\nin the cluster, then modify the number of workers in the cluster.\n| A convenient way to run the `gcloud` command-line tool is from [Cloud Shell](https://console.cloud.google.com/?cloudshell=true), which has the Google Cloud CLI pre-installed. Cloud Shell is free for Google Cloud customers. To use Cloud Shell, you need a Google Cloud project.\n\nYou can find out how to do the same or similar tasks with\n[Quickstarts Using the API Explorer](/dataproc/docs/quickstarts/create-cluster-template),\nthe Google Cloud console in\n[Create a Dataproc cluster by using the Google Cloud console](/dataproc/docs/quickstarts/create-cluster-console),\nand using the client libraries in\n[Create a Dataproc cluster by using client libraries](/dataproc/docs/quickstarts/create-cluster-client-libraries).\n\nBefore you begin\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataproc API.\n\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=dataproc&redirect=https://console.cloud.google.com)\n\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataproc API.\n\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=dataproc&redirect=https://console.cloud.google.com)\n\n\u003cbr /\u003e\n\nCreate a cluster\n\nTo create a cluster called `example-cluster`, run the following command: \n\n```\ngcloud dataproc clusters create example-cluster --region=REGION\n```\n\nThe command output confirms cluster creation: \n\n```\nWaiting for cluster creation operation...done.\nCreated [... example-cluster]\n```\n\n\u003cbr /\u003e\n\nFor information on selecting a region, see\n[Available regions \\& zones](/compute/docs/regions-zones/regions-zones#available).\nTo see a list of available regions, you can run the\n`gcloud compute regions list` command.\nTo learn about regional endpoints, see\n[Regional endpoints](/dataproc/docs/concepts/regional-endpoints).\n\nSubmit a job\n\nTo submit a sample Spark job that calculates a rough value for `pi`, run the\nfollowing command: \n\n```\ngcloud dataproc jobs submit spark --cluster example-cluster \\\n --region=REGION \\\n --class org.apache.spark.examples.SparkPi \\\n --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000\n```\n\nThis command specifies the following:\n\n- You want to run a [`spark`](/sdk/gcloud/reference/dataproc/jobs/submit/spark) job on the `example-cluster` cluster in the specified region\n- The `class` containing the main method for the job's pi-calculating application\n- The location of the jar file containing your job's code\n- Any parameters you want to pass to the job---in this case the number of tasks, which is `1000`\n\n| Parameters passed to the job must follow a double dash (`--`). For more information, see the [Google Cloud CLI documentation](/sdk/gcloud/reference/dataproc/jobs/submit/spark).\n\nThe job's running and final output is displayed in the terminal window: \n\n```\nWaiting for job output...\n...\nPi is roughly 3.14118528\n...\nJob finished successfully.\n```\n\nUpdate a cluster\n\nTo change the number of workers in the cluster to five, run the\nfollowing command: \n\n```\ngcloud dataproc clusters update example-cluster \\\n --region=REGION \\\n --num-workers 5\n```\n\nThe command output displays your cluster's details. For example: \n\n```\nworkerConfig:\n...\n instanceNames:\n - example-cluster-w-0\n - example-cluster-w-1\n - example-cluster-w-2\n - example-cluster-w-3\n - example-cluster-w-4\n numInstances: 5\nstatusHistory:\n...\n- detail: Add 3 workers.\n```\n\nTo decrease the number of worker nodes to the original value, use the same\ncommand: \n\n```\ngcloud dataproc clusters update example-cluster \\\n --region=REGION \\\n --num-workers 2\n```\n\nClean up\n\n\nTo avoid incurring charges to your Google Cloud account for\nthe resources used on this page, follow these steps.\n\n1. To delete your `example-cluster`, run the\n [`clusters delete`](/sdk/gcloud/reference/dataproc/clusters/delete)\n command:\n\n ```\n gcloud dataproc clusters delete example-cluster \\\n --region=REGION\n ```\n\n \u003cbr /\u003e\n\n2. To confirm and complete the cluster deletion, press \u003ckbd\u003ey\u003c/kbd\u003e and then\n press \u003ckbd\u003eEnter\u003c/kbd\u003e when prompted.\n\nWhat's next\n\n- Learn how to [write and run a Spark Scala job](/dataproc/docs/tutorials/spark-scala)."]]