Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Utilizzare Dataproc Serverless Spark con i notebook gestiti
Questa pagina mostra come eseguire un file notebook su Spark serverless
in un'istanza di notebook gestiti da Vertex AI Workbench
utilizzando Dataproc Serverless.
L'istanza Managed Notebooks
può inviare il codice di un file notebook da eseguire sul
servizio Dataproc Serverless. Il servizio esegue
il codice su un'infrastruttura di calcolo gestita che scala automaticamente
le risorse in base alle esigenze. Pertanto,
non devi eseguire il provisioning e gestire il tuo cluster.
Per eseguire un file notebook su Dataproc Serverless Spark,
consulta i seguenti requisiti.
La sessione Dataproc Serverless deve essere eseguita nella stessa regione dell'istanza di blocchi note gestiti.
Il vincolo Richiedi accesso al sistema operativo (constraints/compute.requireOsLogin)
non deve essere abilitato per il tuo progetto. Consulta Gestire OS Login in
un'organizzazione.
Per eseguire un file notebook su Dataproc Serverless,
devi fornire un service account
con autorizzazioni specifiche. Puoi concedere queste autorizzazioni
all'account di servizio predefinito o fornire un service account personalizzato.
Consulta la sezione Autorizzazioni di questa pagina.
La sessione Dataproc Serverless Spark utilizza
una rete Virtual Private Cloud (VPC) per eseguire i carichi di lavoro.
La subnet VPC deve soddisfare requisiti specifici.
Consulta i requisiti in Configurazione di rete di Dataproc Serverless per Spark.
Autorizzazioni
Per assicurarti che il account di servizio disponga delle autorizzazioni necessarie per eseguire un file notebook su Dataproc Serverless,
chiedi all'amministratore di concedere al account di servizio il ruolo IAM
Editor Dataproc (roles/dataproc.editor)
sul tuo progetto.
Questo ruolo predefinito contiene
le autorizzazioni necessarie per eseguire un file notebook su Dataproc Serverless. Per vedere quali sono esattamente le autorizzazioni richieste, espandi la sezione Autorizzazioni obbligatorie:
Autorizzazioni obbligatorie
Per eseguire un file notebook su Dataproc Serverless sono necessarie le seguenti autorizzazioni:
dataproc.agents.create
dataproc.agents.delete
dataproc.agents.get
dataproc.agents.update
dataproc.session.create
dataproc.sessions.get
dataproc.sessions.list
dataproc.sessions.terminate
dataproc.sessions.delete
dataproc.tasks.lease
dataproc.tasks.listInvalidatedLeases
dataproc.tasks.reportStatus
L'amministratore potrebbe anche essere in grado di concedere al account di servizio queste autorizzazioni tramite ruoli personalizzati o altri ruoli predefiniti.
Prima di iniziare
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Fai clic su Apri JupyterLab accanto al nome dell'istanza di blocchi note gestiti.
Avvia una sessione Spark Dataproc Serverless
Per avviare una sessione Spark Dataproc Serverless,
completa i seguenti passaggi.
Nell'interfaccia JupyterLab dell'istanza di notebook gestiti,
seleziona la scheda Avvio app, quindi seleziona Spark serverless.
Se la scheda Avvio app non è aperta,
seleziona File > Nuovo Avvio app per aprirla.
Viene visualizzata la finestra di dialogo Crea sessione Spark serverless.
Nel campo Nome sessione, inserisci un nome per la sessione.
Nella sezione Configurazione dell'esecuzione, inserisci il service account che vuoi utilizzare. Se non inserisci un service account, la sessione utilizzerà l'account di servizio predefinito di Compute Engine.
Quando vuoi eseguire di nuovo il codice
nella sessione Spark di Dataproc Serverless,
imposta di nuovo il kernel
sul kernel Spark di Dataproc Serverless.
Termina la sessione Spark di Dataproc Serverless
Puoi terminare una sessione Spark Dataproc Serverless
nell'interfaccia di JupyterLab o nella console Google Cloud .
Il codice nel file del notebook viene conservato.
JupyterLab
In JupyterLab, chiudi il file del notebook creato quando hai
creato la sessione Dataproc Serverless Spark.
Nella finestra di dialogo visualizzata, fai clic su Termina sessione.
Console Google Cloud
Nella console Google Cloud , vai alla pagina Sessioni Dataproc.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[],[],null,["# Use Dataproc Serverless Spark with managed notebooks\n====================================================\n\n\n| Vertex AI Workbench managed notebooks is\n| [deprecated](/vertex-ai/docs/deprecations). On\n| April 14, 2025, support for\n| managed notebooks will end and the ability to create managed notebooks instances\n| will be removed. Existing instances will continue to function\n| but patches, updates, and upgrades won't be available. To continue using\n| Vertex AI Workbench, we recommend that you\n| [migrate\n| your managed notebooks instances to Vertex AI Workbench instances](/vertex-ai/docs/workbench/managed/migrate-to-instances).\n\n\u003cbr /\u003e\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis page shows you how to run a notebook file on serverless Spark\nin a Vertex AI Workbench managed notebooks instance\nby using [Dataproc Serverless](/dataproc-serverless/docs).\n\nYour managed notebooks instance\ncan submit a notebook file's code to run on\nthe Dataproc Serverless service. The service runs\nthe code on a managed compute infrastructure that automatically\nscales resources as needed. Therefore,\nyou don't need to provision and manage your own cluster.\n\n[Dataproc Serverless charges](/dataproc-serverless/pricing)\napply only to the time when the workload is executing.\n\nRequirements\n------------\n\nTo run a notebook file on Dataproc Serverless Spark,\nsee the following requirements.\n\n- Your Dataproc Serverless session must run in the same\n region as your managed notebooks instance.\n\n- The Require OS Login (`constraints/compute.requireOsLogin`) constraint\n must not be enabled for your project. See [Manage OS Login in\n an organization](https://cloud.google.com/compute/docs/oslogin/manage-oslogin-in-an-org).\n\n- To run a notebook file on Dataproc Serverless,\n you must provide a [service account](/iam/docs/service-accounts)\n that has specific permissions. You can grant these permissions\n to the default service account or provide a custom service account.\n See the [Permissions section of this page](#permissions).\n\n- Your Dataproc Serverless Spark session uses\n a Virtual Private Cloud (VPC) network to execute workloads.\n The VPC subnetwork must meet specific requirements.\n See the requirements in [Dataproc Serverless for\n Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\nPermissions\n-----------\n\n\nTo ensure that the service account has the necessary\npermissions to run a notebook file on Dataproc Serverless,\n\nask your administrator to grant the service account the\n\n\n[Dataproc Editor](/iam/docs/roles-permissions/dataproc#dataproc.editor) (`roles/dataproc.editor`)\nIAM role on your project.\n\n\n| **Important:** You must grant this role to the service account, *not* to your user account. Failure to grant the role to the correct principal might result in permission errors.\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\u003cbr /\u003e\n\n\nThis predefined role contains\n\nthe permissions required to run a notebook file on Dataproc Serverless. To see the exact permissions that are\nrequired, expand the **Required permissions** section:\n\n\n#### Required permissions\n\nThe following permissions are required to run a notebook file on Dataproc Serverless:\n\n- ` dataproc.agents.create `\n- ` dataproc.agents.delete `\n- ` dataproc.agents.get `\n- ` dataproc.agents.update `\n- ` dataproc.session.create `\n- ` dataproc.sessions.get `\n- ` dataproc.sessions.list `\n- ` dataproc.sessions.terminate `\n- ` dataproc.sessions.delete `\n- ` dataproc.tasks.lease `\n- ` dataproc.tasks.listInvalidatedLeases `\n- ` dataproc.tasks.reportStatus`\n\n\nYour administrator might also be able to give the service account\nthese permissions\nwith [custom roles](/iam/docs/creating-custom-roles) or\nother [predefined roles](/iam/docs/roles-overview#predefined).\n\nBefore you begin\n----------------\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Notebooks, Vertex AI, and Dataproc APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=notebooks.googleapis.com,aiplatform.googleapis.com,dataproc)\n\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Notebooks, Vertex AI, and Dataproc APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=notebooks.googleapis.com,aiplatform.googleapis.com,dataproc)\n\n1. If you haven't already, [create\n a managed notebooks instance](/vertex-ai/docs/workbench/managed/create-instance#create).\n2. If you haven't already, configure a VPC network that meets the requirements listed in [Dataproc Serverless\n for Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\nOpen JupyterLab\n---------------\n\n1. In the Google Cloud console, go to the **Managed notebooks** page.\n\n [Go to Managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/managed)\n2. Next to your managed notebooks instance's name,\n click **Open JupyterLab**.\n\nStart a Dataproc Serverless Spark session\n-----------------------------------------\n\nTo start a Dataproc Serverless Spark session,\ncomplete the following steps.\n\n1. In your managed notebooks instance's JupyterLab interface,\n select the **Launcher** tab, and then select **Serverless Spark** .\n If the **Launcher** tab is not open,\n select **File \\\u003e New Launcher** to open it.\n\n The **Create Serverless Spark session** dialog appears.\n2. In the **Session name** field, enter a name for your session.\n\n3. In the **Execution configuration** section, enter\n the **Service account** that you want to use. If you don't enter\n a service account, your session will use the [Compute Engine default\n service account](/compute/docs/access/service-accounts#default_service_account).\n\n4. In the **Network configuration** section, select the\n **Network** and **Subnetwork** of a network that meets the requirements\n listed in [Dataproc Serverless for\n Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\n5. Click **Create**.\n\n A new notebook file opens.\n The Dataproc Serverless Spark session that you created is\n the kernel that runs your notebook file's code.\n\nRun your code on Dataproc Serverless Spark and other kernels\n------------------------------------------------------------\n\n1. Add code to your new notebook file, and run the code.\n\n2. To run code on a different kernel,\n [change the kernel](/vertex-ai/docs/workbench/managed/create-managed-notebooks-instance-console-quickstart#change-kernel).\n\n3. When you want to run the code on\n your Dataproc Serverless Spark session again,\n change the kernel back to\n the Dataproc Serverless Spark kernel.\n\nTerminate your Dataproc Serverless Spark session\n------------------------------------------------\n\nYou can terminate a Dataproc Serverless Spark session\nin the JupyterLab interface or in the Google Cloud console.\nThe code in your notebook file is preserved. \n\n### JupyterLab\n\n1. In JupyterLab, close the notebook file that was created when you\n created your Dataproc Serverless Spark session.\n\n2. In the dialog that appears, click **Terminate session**.\n\n### Google Cloud console\n\n1. In the Google Cloud console, go to the **Dataproc sessions** page.\n\n [Go to Dataproc sessions](https://console.cloud.google.com/dataproc/interactive)\n2. Select the session that you want to terminate,\n and then click **Terminate**.\n\nDelete your Dataproc Serverless Spark session\n---------------------------------------------\n\nYou can delete a Dataproc Serverless Spark session\nby using the Google Cloud console.\nThe code in your notebook file is preserved.\n\n1. In the Google Cloud console, go to the **Dataproc sessions** page.\n\n [Go to Dataproc sessions](https://console.cloud.google.com/dataproc/interactive)\n2. Select the session that you want to delete,\n and then click **Delete**.\n\nWhat's next\n-----------\n\n- Learn more about [Dataproc Serverless](/dataproc-serverless/docs/overview)."]]