Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Menggunakan Dataproc Serverless Spark dengan notebook terkelola
Halaman ini menunjukkan cara menjalankan file notebook di Serverless Spark
dalam instance notebook terkelola Vertex AI Workbench
menggunakan Dataproc Serverless.
Instance notebook terkelola Anda dapat mengirimkan kode file notebook untuk dijalankan pada layanan Dataproc Serverless. Layanan menjalankan
kode pada infrastruktur komputasi terkelola yang otomatis
menskalakan resource sesuai kebutuhan. Oleh karena itu,
Anda tidak perlu menyediakan dan mengelola cluster Anda sendiri.
Untuk menjalankan file notebook di Serverless Spark Dataproc, lihat persyaratan berikut.
Sesi Dataproc Serverless Anda harus berjalan di region yang sama dengan instance notebook terkelola Anda.
Batasan Wajibkan Login OS (constraints/compute.requireOsLogin) tidak boleh diaktifkan untuk project Anda. Baca Mengelola Login OS di organisasi.
Untuk menjalankan file notebook di Dataproc Serverless, Anda harus menyediakan akun layanan yang memiliki izin khusus. Anda dapat memberikan izin ini
ke akun layanan default atau memberikan akun layanan kustom.
Lihat bagian Izin di halaman ini.
Sesi Dataproc Serverless Spark Anda menggunakan jaringan Virtual Private Cloud (VPC) untuk menjalankan workload.
Subnetwork VPC harus memenuhi persyaratan tertentu.
Lihat persyaratan di Dataproc Serverless untuk
konfigurasi jaringan Spark.
Izin
Untuk memastikan bahwa akun layanan memiliki izin yang diperlukan untuk menjalankan file notebook di Dataproc Serverless,
minta administrator Anda untuk memberikan akun layanan
peran IAM Dataproc Editor (roles/dataproc.editor)
di project Anda.
Peran yang telah ditentukan ini berisi
izin yang diperlukan untuk menjalankan file notebook di Dataproc Serverless. Untuk melihat izin yang benar-benar diperlukan, luaskan bagian Izin yang diperlukan:
Izin yang diperlukan
Izin berikut diperlukan untuk menjalankan file notebook di Dataproc Serverless:
dataproc.agents.create
dataproc.agents.delete
dataproc.agents.get
dataproc.agents.update
dataproc.session.create
dataproc.sessions.get
dataproc.sessions.list
dataproc.sessions.terminate
dataproc.sessions.delete
dataproc.tasks.lease
dataproc.tasks.listInvalidatedLeases
dataproc.tasks.reportStatus
Administrator Anda mungkin juga dapat memberi akun layanan
izin ini
dengan peran khusus atau
peran bawaan lainnya.
Sebelum memulai
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Di samping nama instance notebook terkelola, klik Open JupyterLab.
Memulai sesi Dataproc Serverless Spark
Untuk memulai sesi Dataproc Serverless Spark, selesaikan langkah-langkah berikut.
Di antarmuka JupyterLab instance notebook terkelola Anda, pilih tab Launcher, lalu pilih Serverless Spark.
Jika tab Launcher tidak terbuka,
pilih File > New Launcher untuk membukanya.
Dialog Create Serverless Spark session akan muncul.
Di kolom Nama sesi, masukkan nama untuk sesi Anda.
Di bagian Execution configuration, masukkan
Service account yang ingin Anda gunakan. Jika Anda tidak memasukkan akun layanan, sesi Anda akan menggunakan akun layanan default Compute Engine.
File notebook baru akan terbuka.
Sesi Dataproc Serverless Spark yang Anda buat adalah kernel yang menjalankan kode file notebook Anda.
Jalankan kode Anda pada Dataproc Serverless Spark dan kernel lainnya
Tambahkan kode ke file notebook baru, dan jalankan kodenya.
Untuk menjalankan kode pada kernel lain,
ubah kernel.
Jika Anda ingin menjalankan lagi kode pada sesi Dataproc Serverless Spark, ubah kernel kembali ke kernel Dataproc Serverless Spark.
Hentikan sesi Dataproc Serverless Spark Anda
Anda dapat menghentikan sesi Dataproc Serverless Spark
di antarmuka JupyterLab atau di konsol Google Cloud .
Kode di file notebook Anda akan dipertahankan.
JupyterLab
Di JupyterLab, tutup file notebook yang dibuat saat Anda membuat sesi Dataproc Serverless Spark.
Pada dialog yang muncul, klik Hentikan sesi.
Google Cloud console
Di konsol Google Cloud , buka halaman Sesi Dataproc.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[],[],null,["# Use Dataproc Serverless Spark with managed notebooks\n====================================================\n\n\n| Vertex AI Workbench managed notebooks is\n| [deprecated](/vertex-ai/docs/deprecations). On\n| April 14, 2025, support for\n| managed notebooks will end and the ability to create managed notebooks instances\n| will be removed. Existing instances will continue to function\n| but patches, updates, and upgrades won't be available. To continue using\n| Vertex AI Workbench, we recommend that you\n| [migrate\n| your managed notebooks instances to Vertex AI Workbench instances](/vertex-ai/docs/workbench/managed/migrate-to-instances).\n\n\u003cbr /\u003e\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis page shows you how to run a notebook file on serverless Spark\nin a Vertex AI Workbench managed notebooks instance\nby using [Dataproc Serverless](/dataproc-serverless/docs).\n\nYour managed notebooks instance\ncan submit a notebook file's code to run on\nthe Dataproc Serverless service. The service runs\nthe code on a managed compute infrastructure that automatically\nscales resources as needed. Therefore,\nyou don't need to provision and manage your own cluster.\n\n[Dataproc Serverless charges](/dataproc-serverless/pricing)\napply only to the time when the workload is executing.\n\nRequirements\n------------\n\nTo run a notebook file on Dataproc Serverless Spark,\nsee the following requirements.\n\n- Your Dataproc Serverless session must run in the same\n region as your managed notebooks instance.\n\n- The Require OS Login (`constraints/compute.requireOsLogin`) constraint\n must not be enabled for your project. See [Manage OS Login in\n an organization](https://cloud.google.com/compute/docs/oslogin/manage-oslogin-in-an-org).\n\n- To run a notebook file on Dataproc Serverless,\n you must provide a [service account](/iam/docs/service-accounts)\n that has specific permissions. You can grant these permissions\n to the default service account or provide a custom service account.\n See the [Permissions section of this page](#permissions).\n\n- Your Dataproc Serverless Spark session uses\n a Virtual Private Cloud (VPC) network to execute workloads.\n The VPC subnetwork must meet specific requirements.\n See the requirements in [Dataproc Serverless for\n Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\nPermissions\n-----------\n\n\nTo ensure that the service account has the necessary\npermissions to run a notebook file on Dataproc Serverless,\n\nask your administrator to grant the service account the\n\n\n[Dataproc Editor](/iam/docs/roles-permissions/dataproc#dataproc.editor) (`roles/dataproc.editor`)\nIAM role on your project.\n\n\n| **Important:** You must grant this role to the service account, *not* to your user account. Failure to grant the role to the correct principal might result in permission errors.\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\u003cbr /\u003e\n\n\nThis predefined role contains\n\nthe permissions required to run a notebook file on Dataproc Serverless. To see the exact permissions that are\nrequired, expand the **Required permissions** section:\n\n\n#### Required permissions\n\nThe following permissions are required to run a notebook file on Dataproc Serverless:\n\n- ` dataproc.agents.create `\n- ` dataproc.agents.delete `\n- ` dataproc.agents.get `\n- ` dataproc.agents.update `\n- ` dataproc.session.create `\n- ` dataproc.sessions.get `\n- ` dataproc.sessions.list `\n- ` dataproc.sessions.terminate `\n- ` dataproc.sessions.delete `\n- ` dataproc.tasks.lease `\n- ` dataproc.tasks.listInvalidatedLeases `\n- ` dataproc.tasks.reportStatus`\n\n\nYour administrator might also be able to give the service account\nthese permissions\nwith [custom roles](/iam/docs/creating-custom-roles) or\nother [predefined roles](/iam/docs/roles-overview#predefined).\n\nBefore you begin\n----------------\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Notebooks, Vertex AI, and Dataproc APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=notebooks.googleapis.com,aiplatform.googleapis.com,dataproc)\n\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Notebooks, Vertex AI, and Dataproc APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=notebooks.googleapis.com,aiplatform.googleapis.com,dataproc)\n\n1. If you haven't already, [create\n a managed notebooks instance](/vertex-ai/docs/workbench/managed/create-instance#create).\n2. If you haven't already, configure a VPC network that meets the requirements listed in [Dataproc Serverless\n for Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\nOpen JupyterLab\n---------------\n\n1. In the Google Cloud console, go to the **Managed notebooks** page.\n\n [Go to Managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/managed)\n2. Next to your managed notebooks instance's name,\n click **Open JupyterLab**.\n\nStart a Dataproc Serverless Spark session\n-----------------------------------------\n\nTo start a Dataproc Serverless Spark session,\ncomplete the following steps.\n\n1. In your managed notebooks instance's JupyterLab interface,\n select the **Launcher** tab, and then select **Serverless Spark** .\n If the **Launcher** tab is not open,\n select **File \\\u003e New Launcher** to open it.\n\n The **Create Serverless Spark session** dialog appears.\n2. In the **Session name** field, enter a name for your session.\n\n3. In the **Execution configuration** section, enter\n the **Service account** that you want to use. If you don't enter\n a service account, your session will use the [Compute Engine default\n service account](/compute/docs/access/service-accounts#default_service_account).\n\n4. In the **Network configuration** section, select the\n **Network** and **Subnetwork** of a network that meets the requirements\n listed in [Dataproc Serverless for\n Spark network configuration](/dataproc-serverless/docs/concepts/network).\n\n5. Click **Create**.\n\n A new notebook file opens.\n The Dataproc Serverless Spark session that you created is\n the kernel that runs your notebook file's code.\n\nRun your code on Dataproc Serverless Spark and other kernels\n------------------------------------------------------------\n\n1. Add code to your new notebook file, and run the code.\n\n2. To run code on a different kernel,\n [change the kernel](/vertex-ai/docs/workbench/managed/create-managed-notebooks-instance-console-quickstart#change-kernel).\n\n3. When you want to run the code on\n your Dataproc Serverless Spark session again,\n change the kernel back to\n the Dataproc Serverless Spark kernel.\n\nTerminate your Dataproc Serverless Spark session\n------------------------------------------------\n\nYou can terminate a Dataproc Serverless Spark session\nin the JupyterLab interface or in the Google Cloud console.\nThe code in your notebook file is preserved. \n\n### JupyterLab\n\n1. In JupyterLab, close the notebook file that was created when you\n created your Dataproc Serverless Spark session.\n\n2. In the dialog that appears, click **Terminate session**.\n\n### Google Cloud console\n\n1. In the Google Cloud console, go to the **Dataproc sessions** page.\n\n [Go to Dataproc sessions](https://console.cloud.google.com/dataproc/interactive)\n2. Select the session that you want to terminate,\n and then click **Terminate**.\n\nDelete your Dataproc Serverless Spark session\n---------------------------------------------\n\nYou can delete a Dataproc Serverless Spark session\nby using the Google Cloud console.\nThe code in your notebook file is preserved.\n\n1. In the Google Cloud console, go to the **Dataproc sessions** page.\n\n [Go to Dataproc sessions](https://console.cloud.google.com/dataproc/interactive)\n2. Select the session that you want to delete,\n and then click **Delete**.\n\nWhat's next\n-----------\n\n- Learn more about [Dataproc Serverless](/dataproc-serverless/docs/overview)."]]