Apache Spark용 서버리스 Spark 워크로드에 메모리 및 디스크 리소스를 지정할 수 있나요?
예. 워크로드를 제출할 때 프리미엄 실행자 및 드라이버 컴퓨팅과 디스크 계층, 드라이버 및 실행자가 컴퓨팅할 양, 할당할 디스크 리소스의 양을 지정할 수 있습니다(리소스 할당 속성을 참조하세요).
Apache Spark용 서버리스 VPC 네트워크의 IP 주소 범위를 지정하려면 어떻게 해야 하나요?
Apache Spark용 서버리스 워크로드가 사용자 환경 내에서 실행됩니다.
서버리스 Spark 워크로드의 각 Spark 드라이버 및 Spark 실행자는 Apache Spark용 서버리스 VPC 네트워크의 내부 IP 주소 하나를 사용합니다.
/16은 Apache Spark용 서버리스 VPC 네트워크에 대한 일반적인 사용자 지정 CIDR IP 주소 범위입니다.
실행할 동시 워크로드 수에 따라 네트워크의 IP 주소 범위를 제한할 수 있습니다.
Apache Spark용 서버리스는 데이터 상주를 지원하나요?
예. 워크로드가 처리되는 리전을 지정합니다.
지정한 리전에서 입력 및 출력 데이터 세트를 찾습니다.
Apache Spark용 서버리스는 지정된 리전 내에서 워크로드를 실행할 영역을 어떻게 선택하나요?
Apache Spark를 위한 서버리스는 용량 및 가용성에 따라 워크로드를 실행하는 Compute Engine 영역을 선택합니다. 워크로드가 시작된 후 영역을 사용할 수 없게 되면 워크로드가 실패하며 실패한 워크로드는 다시 제출해야 합니다.
Apache Spark용 서버리스 워크로드는 어떤 방식으로 컴퓨팅 리소스를 사용하나요?
각 워크로드는 자체 컴퓨팅 리소스에서 실행됩니다. 여러 배치 제출은 컴퓨팅 리소스를 공유하거나 재사용하지 않습니다.
Best Practices:
단기 실행 작업이 아닌 중기 실행 작업에 대해 워크로드를 최적화합니다.
여러 워크로드에서 액세스하는 데이터를 Cloud Storage에 유지합니다.
Apache Spark용 서버리스 공지사항, 기능, 버그 수정, 알려진 문제, 지원 중단에 대한 정보는 어디에서 찾을 수 있나요?
Spark 실행자 및 드라이버 로그는 Spark 워크로드 실행 중 및 실행 후 Cloud Logging에서 확인할 수 있습니다. 또한 Spark 애플리케이션은 워크로드가 실행되는 동안 영구 기록 서버(PHS) 웹 인터페이스(PHS UI에서 PHS > 불완전한 애플리케이션 선택)에 표시됩니다.
Dataproc PHS를 설정하면 Cloud Storage에 저장된 Spark 이벤트 로그에 대한 영구 액세스 권한이 제공되어 DAG 및 실행자 이벤트와 같은 Spark 앱 실행에 대한 유용한 정보를 제공합니다.
Spark 워크로드의 실행자 수를 설정할 수 있나요?
예. spark.executor.instances 속성을 사용하여 Spark 워크로드의 실행자 수를 설정할 수 있습니다. 그러나 Spark는 코어당 1개의 작업을 실행하므로, 워크로드가 사용할 수 있는 전체 코어 수는 실행자 수보다 중요합니다. 예를 들어 워크로드에 각각 2개의 코어가 있는 실행자 4개가 있는 경우 동시에 실행되는 태스크 수는 4 * 2 = 8입니다. 각각 4개의 코어가 있는 실행자 2개가 있는 워크로드에 대해서도 동일한 수의 태스크를 실행합니다. 각 워크로드의 코어 수가 동일하므로 동일한 수의 태스크가 실행됩니다. spark.executor.cores 속성을 사용하여 Apache Spark용 서버리스 워크로드의 실행자당 코어 수를 설정할 수 있습니다.
Apache Spark용 서버리스는 자동 확장에 어떤 Spark 측정항목을 사용하나요?
Apache Spark용 서버리스는 maximum-needed 및 running Spark 동적 할당 측정항목을 확인하여 확장 또는 축소할지를 결정합니다.
Apache Spark용 서버리스 자동 확장을 참고하세요.
Spark 속성을 사용하여 Apache Spark용 서버리스 자동 확장 동작을 구성할 수 있나요?
예. Apache Spark용 서버리스 자동 확장은 Spark 동적 할당을 기반으로 하며 기본적으로 사용 설정됩니다. 다음 Spark 속성 및 Spark 동적 할당 속성을 조정할 수 있습니다.
spark.executor.instances
spark.dynamicAllocation.initialExecutors
spark.dynamicAllocation.minExecutors
spark.dynamicAllocation.maxExecutors
Spark 워크로드를 제출하기 위해 코드를 JAR 파일에 패키징해야 하는 이유는 무엇인가요?
Spark는 Scala로 작성됩니다. 즉, 드라이버와 작업자 프로세스가 모두 JVM 프로세스로 작동합니다. JVM 언어에서 JAR 파일은 코드를 패키징하는 기본 방법입니다. 워크로드를 제출할 때 JAR 파일을 Apache Spark용 Serverless에 전달합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eDataproc Serverless for Spark supports batch workloads and interactive sessions, managing the underlying infrastructure for you, whereas Dataproc on Compute Engine requires manual cluster management and supports a wider range of open-source components.\u003c/p\u003e\n"],["\u003cp\u003eYou can leverage Dataproc Serverless for Spark to run batch and streaming jobs, train models, utilize interactive SQL notebooks, and orchestrate workloads with Cloud Composer.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Serverless allows for the use of custom container images and provides options to specify memory, disk resources, and the number of executors for Spark workloads.\u003c/p\u003e\n"],["\u003cp\u003eWorkload execution can be concurrent or sequential, with the ability to manage the IP address range within your VPC network to accommodate the desired number of concurrent workloads, within resource quotas.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Serverless workloads use dynamic allocation metrics to autoscale, with the ability to configure autoscaling behavior by adjusting specific Spark and Spark dynamic allocation properties.\u003c/p\u003e\n"]]],[],null,["# Serverless for Apache Spark FAQ\n\nThis page contains frequently asked Google Cloud Serverless for Apache Spark questions with answers.\n\n### When should I use Serverless for Apache Spark instead of Dataproc on Compute Engine?\n\n- Serverless for Apache Spark:\n\n - Supports Spark batch workloads and interactive sessions in PySpark kernel Jupyter notebooks.\n - Serverless for Apache Spark creates and manages your workload and interactive session infrastructure.\n- Dataproc on Compute Engine:\n\n - Supports the submission of different types Spark jobs, and jobs based on\n other open source components, such as Flink, Hadoop, Hive, Pig, Presto,\n and others.\n\n - Does not create and manage infrastructure. You create and\n manage your Dataproc clusters.\n\n### What can I do with Serverless for Apache Spark?\n\n- [Run batch jobs](/dataproc-serverless/docs/quickstarts/spark-batch).\n\n- [Use the Dataproc JupyterLab plugin for serverless batch and\n interactive notebook sessions](/dataproc-serverless/docs/quickstarts/jupyterlab-sessions).\n\n- Run streaming jobs using Spark streaming libraries. Note: Streaming\n is not a managed service, so you must manage checkpointing and restarts.\n\n- Train models using Spark MLlib.\n\n- Use interactive SQL notebooks for data exploration, graph, time series, and\n geospatial analytics.\n\n- Orchestrate Serverless for Apache Spark workloads with Cloud Composer, a\n managed Apache Airflow service.\n\n### How should I set up a workload execution plan?\n\nYou can run workloads concurrently or sequentially. Your execution plan\nimpacts your Google Cloud resource quota. You can run as many workloads\nin parallel as your [batch resource](/dataproc-serverless/quotas#default_batch_resources)\nquotas allow.\n\nCan I use a custom image with Serverless for Apache Spark?\n----------------------------------------------------------\n\n- Yes. You can use a custom container image instead of the default container image. See [Use custom containers with Serverless for Apache Spark](/dataproc-serverless/docs/guides/custom-containers).\n\nCan I specify memory and disk resources for Serverless for Apache Spark Spark workloads?\n----------------------------------------------------------------------------------------\n\nYes. You can specify premium executor and driver compute and\ndisk tiers and the amount of driver and executor compute and disk resources\nto allocate when you submit a workload (see\n[Resource allocation properties](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)).\n\nHow can I specify the IP address range for my Serverless for Apache Spark VPC network?\n--------------------------------------------------------------------------------------\n\nServerless for Apache Spark workloads run within your environment.\nEach Spark driver and Spark executor in a Serverless Spark workload consumes one\ninternal IP address in your [Serverless for Apache Spark VPC network](/dataproc-serverless/docs/concepts/network).\n`/16` is a typical user-specified\n[CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) address range\nfor a [Serverless for Apache Spark VPC network](/dataproc-serverless/docs/concepts/network).\nYou can limit your network's IP address range based on the number of concurrent\nworkloads you plan to run.\n\nDoes Serverless for Apache Spark support data residency?\n--------------------------------------------------------\n\nYes. You specify the region where your workload is processed.\nLocate you input and output datasets in the specified region.\n\nHow does Serverless for Apache Spark select a zone within your specified region to run the workload?\n----------------------------------------------------------------------------------------------------\n\nServerless for Apache Spark selects the Compute Engine zone where it executes a workload\nbased on capacity and availability. If a zone becomes unavailable after\na workload starts, the workload fails, and you must resubmit the\nfailed workload.\n\nHow do Serverless for Apache Spark workloads use compute resources?\n-------------------------------------------------------------------\n\nEach workload executes on its own compute resources. Multiple batch\nsubmissions don't share or reuse compute resources.\n\n**Best Practices:**\n\n- Optimize your workload for medium-running jobs, not short-running jobs.\n\n- Persist data that is accessed by multiple workloads in Cloud Storage.\n\nWhere can I find information on Serverless for Apache Spark announcements, features, bug fixes, known issues, and deprecations?\n-------------------------------------------------------------------------------------------------------------------------------\n\nSee the [Serverless for Apache Spark release notes](/dataproc-serverless/docs/release-notes).\n\nDo concurrent workloads compete for resources?\n----------------------------------------------\n\nServerless for Apache Spark workloads only compete for resources\nif your resource quota is insufficient to run all concurrently running workloads.\nOtherwise, workloads are fully isolated from each other.\n\nHow is Serverless for Apache Spark quota allocated?\n---------------------------------------------------\n\nServerless for Apache Spark batches consume Google Cloud resources.\nSee [Dataproc Serverless quotas](/dataproc-serverless/quotas) for more\ninformation.\n\nDo I need to set up a Dataproc Persistent History Server?\n---------------------------------------------------------\n\nSetting up a [Persistent History Server (PHS)](/dataproc/docs/concepts/jobs/history-server)\nto use with Serverless for Apache Spark is optional.You can use the PHS\nto view Spark event and other logs in a specified Cloud Storage bucket up to and\nafter the standard\n[Serverless for Apache Spark staging and temp bucket](/dataproc-serverless/docs/concepts/buckets)\n90-day retention (TTL) period.\n| **Note:** The PHS must be located in the region where you run batch workloads.\n\nWhat Serverless for Apache Spark Spark logs are available?\n----------------------------------------------------------\n\nSpark executors and driver logs are available in Cloud Logging during and\nafter Spark workload execution. Also, Spark applications are visible in the\n[Persistent History Server (PHS)](/dataproc/docs/concepts/jobs/history-server)\nweb interface while the workload is running (select **PHS** \\\u003e **Incomplete Applications**\nin the PHS UI).\n\nIf you set up a Dataproc PHS, it provides persistent access to\nSpark event logs saved in Cloud Storage, which\nprovide insight into Spark app execution, such DAG and executor events.\n| **Note:** The PHS must be located in the region where you run batch workloads.\n\nCan I set the number of executors for my Spark workload?\n--------------------------------------------------------\n\nYes. You can set the number of executors for a Spark workload using the\n[`spark.executor.instances`](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)\nproperty. However, the total number of cores that a workload can use is more important\nthan the number of executors because Spark runs 1 task per core. For example,\nif a workload has four executors with two cores each, it will run `4 * 2 = 8` tasks\nat the same time. And it will also run the same number of tasks for a workload that\nhas two executors with four cores each. Since the number of cores for each workload is the\nsame, they will run the same number of tasks. You can use the\n[`spark.executor.cores`](/dataproc-serverless/docs/concepts/properties#resource_allocation_properties)\nproperty to set the number cores per executor for your Serverless for Apache Spark workload.\n\nWhat Spark metrics does Serverless for Apache Spark use for autoscaling?\n------------------------------------------------------------------------\n\nServerless for Apache Spark looks at the `maximum-needed` and `running`\nSpark's dynamic allocation metrics to determine whether to scale up or down.\nSee [Serverless for Apache Spark autoscaling](/dataproc-serverless/docs/concepts/autoscaling).\n\nCan I configure Serverless for Apache Spark autoscaling behavior using Spark properties?\n----------------------------------------------------------------------------------------\n\nYes. Serverless for Apache Spark autoscaling is based on Spark dynamic allocation, and\nis enabled by default. You can adjust the following\n[Spark properties](/dataproc-serverless/docs/concepts/properties#supported_spark_properties)\nand [Spark dynamic allocation properties](/dataproc-serverless/docs/concepts/autoscaling#spark_dynamic_allocation_properties):\n\n- `spark.executor.instances`\n- `spark.dynamicAllocation.initialExecutors`\n- `spark.dynamicAllocation.minExecutors`\n- `spark.dynamicAllocation.maxExecutors`\n\nWhy do I need to package my code in a JAR file to submit my Spark workload?\n---------------------------------------------------------------------------\n\nSpark is written in Scala, which means that both the driver and the worker processes\noperate as JVM processes. In JVM languages, the JAR file is the primary way to\npackage code. You pass the JAR file to Serverless for Apache Spark when you\nsubmit a workload."]]