Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Dataproc proporciona la capacidad de adjuntar unidades de procesamiento de gráficos (GPU) a los nodos trabajadores y principales de Compute Engine en un clúster de Dataproc. Puedes usar estas GPU para acelerar cargas de trabajo específicas en las instancias, como aprendizaje automático y procesamiento de datos.
Para obtener más información sobre lo que puedes hacer con las GPU y qué tipos de hardware de GPU están disponibles, consulta GPU en Compute Engine.
Antes de comenzar
Las GPU requieren controladores y software especiales. Esos artículos no están preinstalados en los clústeres de Dataproc.
Verifica la página de cuotas para tu proyecto a fin de asegurarte de que tienes una cuota suficiente de GPU (NVIDIA_T4_GPUS, NVIDIA_P100_GPUS o NVIDIA_V100_GPUS) disponible en el proyecto. Si las GPU no se enumeran en la página de cuotas o necesitas una cuota de GPU adicional, solicita un aumento de la cuota.
Tipos de GPU
Los nodos de ataproc son compatibles con los tipos de GPU siguientes. Debes especificar el tipo de GPU cuando adjuntas una GPU al clúster de Dataproc.
nvidia-tesla-l4 - NVIDIA® Tesla® L4
nvidia-tesla-a100 - NVIDIA® Tesla® A100
nvidia-tesla-p100 - NVIDIA® Tesla® P100
nvidia-tesla-v100 - NVIDIA® Tesla® V100
nvidia-tesla-p4 - NVIDIA® Tesla® P4
nvidia-tesla-t4 - NVIDIA® Tesla® T4
nvidia-tesla-p100-vws - Estaciones de trabajo virtuales NVIDIA® Tesla® P100
nvidia-tesla-p4-vws - Estaciones de trabajo virtuales NVIDIA® Tesla® P4
nvidia-tesla-t4-vws: estaciones de trabajo virtuales NVIDIA® Tesla® T4
Adjunta las GPU a los nodos trabajadores principales, primarios y secundarios en un clúster de Dataproc completando los campos acceleratorTypeUri y acceleratorCount de InstanceGroupConfig.AcceleratorConfig como parte de la solicitud a la API cluster.create.
Console
En la consola de Google Cloud , haz clic en PLATAFORMA DE CPU Y GPU→GPU→AGREGAR GPU en las secciones de nodos principales y trabajadores del panel Configurar nodos en la página Crear un clúster para especificar la cantidad de GPU y el tipo de GPU para los nodos.
Instalar los controladores de GPU.
Se requieren controladores de GPU para usar cualquier GPU adjunta a los nodos de Dataproc.
Para instalar los controladores de GPU, consulta las siguientes instrucciones:
Una vez que finalizas la instalación del controlador de GPU en los nodos de Dataproc, puedes verificar que el controlador funcione de forma correcta. Establece una conexión SSH al nodo principal de tu clúster de Dataproc y ejecuta el siguiente comando:
Cuando envíes un trabajo a Spark, puedes usar la propiedad entorno de ejecución de configuración de Spark spark.executorEnv con la variable de entorno LD_PRELOAD para precargar las bibliotecas necesarias.
Ejecuta el ejemplo siguiente con spark-shell para ejecutar el cálculo de matriz:
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import java.util.Random
def makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {
val range = sc.parallelize(1 to nBlocks)
val indices = range.cartesian(range)
return new BlockMatrix(
indices.map(
ij => (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),
rowsPerBlock, rowsPerBlock, 0, 0)
}
val N = 1024 * 4
val n = 2
val mat1 = makeRandomSquareBlockMatrix(N, n)
val mat2 = makeRandomSquareBlockMatrix(N, n)
val mat3 = mat1.multiply(mat2)
mat3.blocks.persist.count
println("Processing complete!")
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-04 (UTC)"],[[["\u003cp\u003eDataproc clusters can utilize GPUs attached to their master and worker nodes to accelerate machine learning and data processing workloads.\u003c/p\u003e\n"],["\u003cp\u003eThere are no additional Dataproc charges for using GPUs; however, standard Compute Engine charges for GPU usage apply and must be reviewed.\u003c/p\u003e\n"],["\u003cp\u003eBefore using GPUs on Dataproc, special drivers need to be installed, and users should verify they have sufficient GPU quota in their project.\u003c/p\u003e\n"],["\u003cp\u003eYou can attach different types of GPUs, including \u003ccode\u003envidia-tesla-l4\u003c/code\u003e, \u003ccode\u003envidia-tesla-a100\u003c/code\u003e, and others, to your Dataproc cluster nodes via gcloud, the REST API, or the Google Cloud console.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003espark.executorEnv\u003c/code\u003e property can be configured in Spark jobs to properly utilize GPUs by preloading the necessary libraries through the use of the \u003ccode\u003eLD_PRELOAD\u003c/code\u003e environment variable.\u003c/p\u003e\n"]]],[],null,["Dataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.\n\nFor more information about what you can do with GPUs and what types of GPU hardware are available, read [GPUs on Compute Engine](/compute/docs/gpus).\n| There are no additional [Dataproc pricing](/dataproc/pricing) charges added to Compute Engine charges for GPUs used in Dataproc clusters.\n\nBefore you begin\n\n- GPUs require special drivers and software. These items are not pre-installed on Dataproc clusters.\n- Read about [GPU pricing on Compute Engine](/compute/pricing#gpus) to understand the cost to use GPUs in your instances.\n- Read about [restrictions for instances with GPUs](/compute/docs/gpus#restrictions) to learn how these instances function differently from non-GPU instances.\n- Check the [quotas page](https://console.cloud.google.com/iam-admin/quotas) for your project to ensure that you have sufficient GPU quota (`NVIDIA_T4_GPUS`, `NVIDIA_P100_GPUS`, or `NVIDIA_V100_GPUS`) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, [request a quota increase](/compute/quotas#requesting_additional_quota).\n\nTypes of GPUs\n\nDataproc nodes support the following GPU types. You must specify\nGPU type when attaching GPUs to your Dataproc cluster.\n\n- `nvidia-tesla-l4` - NVIDIA® Tesla® L4\n- `nvidia-tesla-a100` - NVIDIA® Tesla® A100\n- `nvidia-tesla-p100` - NVIDIA® Tesla® P100\n- `nvidia-tesla-v100` - NVIDIA® Tesla® V100\n- `nvidia-tesla-p4` - NVIDIA® Tesla® P4\n- `nvidia-tesla-t4` - NVIDIA® Tesla® T4\n- `nvidia-tesla-p100-vws` - NVIDIA® Tesla® P100 Virtual Workstations\n- `nvidia-tesla-p4-vws` - NVIDIA® Tesla® P4 Virtual Workstations\n- `nvidia-tesla-t4-vws` - NVIDIA® Tesla® T4 Virtual Workstations\n\nAttach GPUs to clusters \n\ngcloud\n\nAttach GPUs to the master and primary and secondary worker nodes in a Dataproc cluster when\ncreating the cluster using the\n[`‑‑master-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--master-accelerator),\n[`‑‑worker-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--worker-accelerator), and\n[`‑‑secondary-worker-accelerator`](/sdk/gcloud/reference/dataproc/clusters/create#--secondary-worker-accelerator) flags. These flags take the\nfollowing two values:\n\n1. the type of GPU to attach to a node, and\n2. the number of GPUs to attach to the node.\n\nThe type of GPU is required, and the number of GPUs is optional (the default\nis 1 GPU).\n\n\n**Example:** \n\n```\ngcloud dataproc clusters create cluster-name \\\n --region=region \\\n --master-accelerator type=nvidia-tesla-t4 \\\n --worker-accelerator type=nvidia-tesla-t4,count=4 \\\n --secondary-worker-accelerator type=nvidia-tesla-t4,count=4 \\\n ... other flags\n```\n\nTo use GPUs in your cluster, you must\n[install GPU drivers](/compute/docs/gpus/install-drivers-gpu).\n\nREST API\n\nAttach GPUs to the master and primary and secondary worker nodes in a Dataproc cluster\nby filling in the\n[InstanceGroupConfig.AcceleratorConfig](/dataproc/docs/reference/rest/v1/ClusterConfig#AcceleratorConfig)\n`acceleratorTypeUri` and `acceleratorCount` fields as part of the\n[cluster.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nAPI request.\n\nConsole\n\nClick CPU PLATFORM AND GPU→GPUs→ADD GPU in the master and worker nodes sections of the\nConfigure nodes panel on the [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd)\npage in the Google Cloud console to specify the number of GPUs and GPU type\nfor the nodes.\n\nInstall GPU drivers\n\nGPU drivers are required to utilize any GPUs attached to Dataproc nodes.\nTo install GPU drivers, see the following instructions:\n\n- [Spark Rapids](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/spark-rapids).\n- [GPU ML Libraries](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/gpu).\n\nVerify GPU driver install\n\nAfter you have finished installing the GPU driver on your Dataproc nodes, you can verify\nthat the driver is functioning properly. SSH into the master node of your Dataproc cluster\nand run the following command: \n\n```\nnvidia-smi\n```\n\nIf the driver is functioning properly, the output will display the driver version and GPU statistics (see [Verifying the GPU driver install](/compute/docs/gpus/install-drivers-gpu#verify-driver-install)).\n| **Note:** The driver may not work correctly after a restart of the VM following a Linux [Unattended Upgrade](https://wiki.debian.org/UnattendedUpgrades). Possible solutions: You can disable unattended upgrades or exclude kernel updates by editing the unattended upgrades service config.\n\nSpark configuration\n\nWhen you [submit a job](/dataproc/docs/guides/submit-job) to Spark,\nyou can use the `spark.executorEnv` Spark configuration\n[runtime environment property](https://spark.apache.org/docs/latest/configuration.html#runtime-environment)\nproperty with the `LD_PRELOAD` environment variable to preload needed libraries.\n\nExample: \n\n```\ngcloud dataproc jobs submit spark --cluster=CLUSTER_NAME \\\n --region=REGION \\\n --class=org.apache.spark.examples.SparkPi \\\n --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \\\n --properties=spark.executorEnv.LD_PRELOAD=libnvblas.so,spark.task.resource.gpu.amount=1,spark.executor.resource.gpu.amount=1,spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh\n```\n\nExample GPU job\n\nYou can test GPUs on Dataproc by running any of the following jobs,\nwhich benefit when run with GPUs:\n\n1. Run one of the [Spark ML examples](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala).\n2. Run the following example with `spark-shell` to run a matrix computation:\n\n```\nimport org.apache.spark.mllib.linalg._\nimport org.apache.spark.mllib.linalg.distributed._\nimport java.util.Random\n\ndef makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {\n val range = sc.parallelize(1 to nBlocks)\n val indices = range.cartesian(range)\n return new BlockMatrix(\n indices.map(\n ij =\u003e (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),\n rowsPerBlock, rowsPerBlock, 0, 0)\n}\n\nval N = 1024 * 4\nval n = 2\nval mat1 = makeRandomSquareBlockMatrix(N, n)\nval mat2 = makeRandomSquareBlockMatrix(N, n)\nval mat3 = mat1.multiply(mat2)\nmat3.blocks.persist.count\nprintln(\"Processing complete!\")\n```\n\nWhat's Next\n\n- Learn how to [create a Compute Engine instance with attached GPUs](/compute/docs/gpus/add-gpus).\n- Learn more about [GPU machine types](/compute/docs/gpus)."]]