Jupyter 구성. dataproc:jupyter클러스터 속성을 제공하여 Jupyter를 구성할 수 있습니다.
보안되지 않은 노트북 서버 API에 대한 원격 코드 실행 위험을 줄이기 위해 기본 dataproc:jupyter.listen.all.interfaces 클러스터 속성 설정은 false인데, 이는 구성요소 게이트웨이가 사용 설정되었을 때(Jupyter 구성요소를 설치할 때 구성요소 게이트웨이 활성화 필수) localhost (127.0.0.1)로 연결을 제한하는 설정입니다.
데이터 파일을 사용한 작업. Jupyter 노트북을 사용하면 Cloud Storage에 업로드된 데이터 파일을 사용하여 작업할 수 있습니다.
Cloud Storage 커넥터가 Dataproc 클러스터에 사전 설치되므로 노트북에서 직접 파일을 참조할 수 있습니다. 다음은 Cloud Storage의 CSV 파일에 액세스하는 예시입니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-08-26(UTC)"],[[["\u003cp\u003eThe Jupyter component is a single-user, web-based notebook for interactive data analytics, accessible via port 8123 on the cluster's first master node, and it also supports the JupyterLab Web UI.\u003c/p\u003e\n"],["\u003cp\u003eTo enable multi-user notebook access, you can utilize a Dataproc-enabled Vertex AI Workbench instance or install the Dataproc JupyterLab plugin on a VM.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks can be configured using specific cluster properties, and by default, notebooks are saved in Cloud Storage, with the location being customizable at cluster creation.\u003c/p\u003e\n"],["\u003cp\u003eThe Jupyter component can be installed when creating a Dataproc cluster through the Google Cloud console, gcloud CLI, or REST API, but requires the Component Gateway to be enabled.\u003c/p\u003e\n"],["\u003cp\u003eJupyter notebooks support working directly with data files in Cloud Storage, and you can also attach GPUs to master and worker nodes to enhance machine learning tasks within Jupyter.\u003c/p\u003e\n"]]],[],null,["You can install additional components like Jupyter when you create a Dataproc\ncluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes the Jupyter component.\n\nThe [Jupyter](http://jupyter.org/) component\nis a Web-based **single-user** notebook for interactive data analytics and supports the\n[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/index.html)\nWeb UI. The Jupyter Web UI is available on port `8123` on the cluster's first master node.\n\n**Launch notebooks for multiple users.** You can create a Dataproc-enabled\n[Vertex AI Workbench instance](/vertex-ai/docs/workbench/instances/create-dataproc-enabled)\nor [install the Dataproc JupyterLab plugin](/dataproc-serverless/docs/quickstarts/jupyterlab-sessions)\non a VM to to serve notebooks to multiple users.\n\n**Configure Jupyter.** Jupyter can be configured by providing `dataproc:jupyter`\n[cluster properties](/dataproc/docs/concepts/configuring-clusters/cluster-properties#service_properties).\nTo reduce the risk of remote code execution over unsecured notebook server\nAPIs, the default `dataproc:jupyter.listen.all.interfaces` cluster property\nsetting is `false`, which restricts connections to `localhost (127.0.0.1)` when\nthe [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways) is\nenabled (Component Gateway activation is required when installing the Jupyter component).\n\nThe Jupyter notebook provides a Python kernel to run [Spark](https://spark.apache.org/) code, and a\nPySpark kernel. By default, notebooks are [saved in Cloud Storage](https://github.com/src-d/jgscm)\nin the Dataproc staging bucket, which is specified by the user or\n[auto-created](/dataproc/docs/guides/create-cluster#auto-created_staging_bucket)\nwhen the cluster is created. The location can be changed at cluster creation time using the\n[`dataproc:jupyter.notebook.gcs.dir`](/dataproc/docs/concepts/configuring-clusters/cluster-properties#dataproc-properties) cluster property.\n\n**Work with data files.** You can use a Jupyter notebook to work with data files that have been\n[uploaded to Cloud Storage](/storage/docs/uploading-objects).\nSince the [Cloud Storage connector](/dataproc/docs/concepts/connectors/cloud-storage)\nis pre-installed on a Dataproc cluster, you can reference the\nfiles directly in your notebook. Here's an example that accesses CSV files in\nCloud Storage: \n\n```\ndf = spark.read.csv(\"gs://bucket/path/file.csv\")\ndf.show()\n```\n\nSee\n[Generic Load and Save Functions](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html)\nfor PySpark examples.\n\nInstall Jupyter\n\nInstall the component when you create a Dataproc cluster.\nThe Jupyter component requires activation of the Dataproc\n[Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n**Note:** Only when using [image version 1.5](/dataproc/docs/concepts/versioning/dataproc-version-clusters#unsupported-dataproc-image-versions), installation of the Jupyter component also requires installation of the [Anaconda](/dataproc/docs/concepts/components/anaconda) component. \n\nConsole\n\n1. Enable the component.\n - In the Google Cloud console, open the Dataproc [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page. The **Set up cluster** panel is selected.\n - In the **Components** section:\n - Under **Optional components** , select the **Jupyter** component.\n - Under **Component Gateway** , select **Enable component gateway** (see [Viewing and Accessing Component Gateway URLs](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)).\n\ngcloud CLI\n\nTo create a Dataproc cluster that includes the Jupyter component,\nuse the\n[gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create) \u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e command with the `--optional-components` flag.\n\n**Latest default image version example**\n\nThe following example installs the Jupyter\ncomponent on a cluster that uses the latest default image version. \n\n```\ngcloud dataproc clusters create cluster-name \\\n --optional-components=JUPYTER \\\n --region=region \\\n --enable-component-gateway \\\n ... other flags\n```\n\nREST API\n\nThe Jupyter component\ncan be installed through the Dataproc API using\n[`SoftwareConfig.Component`](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[`clusters.create`](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n\n- Set the [EndpointConfig.enableHttpPortAccess](/dataproc/docs/reference/rest/v1/ClusterConfig#endpointconfig) property to `true` as part of the `clusters.create` request to enable connecting to the Jupyter notebook Web UI using the [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n\nOpen the Jupyter and JupyterLab UIs\n\nClick the [Google Cloud console Component Gateway links](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls)\nto open in your local browser the Jupyter notebook or JupyterLab UI running on\nthe cluster master node.\n\n**Select \"GCS\" or \"Local Disk\" to create a new Jupyter Notebook in\neither location.**\n\nAttach GPUs to master and worker nodes\n\nYou can [add GPUs](https://cloud.google.com/dataproc/docs/concepts/compute/gpus)\nto your cluster's master and worker nodes when using a Jupyter notebook to:\n\n1. Preprocess data in Spark, then collect a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) onto the master and run [TensorFlow](https://www.tensorflow.org/)\n2. Use Spark to orchestrate TensorFlow runs in parallel\n3. Run [Tensorflow-on-YARN](https://github.com/criteo/tf-yarn)\n4. Use with other machine learning scenarios that use GPUs"]]