Dataproc은 일괄 처리, 쿼리, 스트리밍, 머신러닝에 오픈소스 데이터 도구를 사용할 수 있는 관리형 Apache Spark 및 Apache Hadoop 서비스입니다.
Dataproc 자동화를 통해 신속하게 클러스터를 만들고 손쉽게 관리하며 불필요한 클러스터를 사용 중지하여 비용을 절감할 수 있습니다. 관리 시간과 비용이 절감되므로 작업과 데이터에 집중할 수 있습니다.
자세히 알아보기
무료 크레딧 $300로 개념 증명 시작
-
Gemini 2.0 Flash Thinking 이용
-
AI API 및 BigQuery를 포함하여 인기 제품 월별 무료 사용량
-
자동 청구, 약정 없음
20개가 넘는 항상 무료 제품을 계속 살펴보기
AI API, VM, 데이터 웨어하우스 등 일반적인 사용 사례에 20개가 넘는 무료 제품을 사용할 수 있습니다.
학습
교육 및 튜토리얼
Google Kubernetes Engine에서 Spark 작업 실행
Dataproc Jobs API에서 실행 중인 Google Kubernetes Engine 클러스터에 Spark 작업을 제출합니다.
학습
교육 및 튜토리얼
Cloud Dataproc 소개: Google Cloud 기반 Hadoop 및 Spark
이 과정에서는 강의, 데모, 실무형 실습을 결합하여 Dataproc 클러스터를 만들고 Spark 작업을 제출하고 클러스터를 종료합니다.
학습
교육 및 튜토리얼
Dataproc의 Spark를 사용한 머신러닝
이 과정에서는 강의, 데모, 실무형 실습을 결합하여 Dataproc 클러스터에서 실행되는 Apache Spark용 머신 러닝 라이브러리를 사용하여 로지스틱 회귀를 구현해 다변수 데이터 세트의 데이터 모델을 개발합니다.
사용 사례
사용 사례
워크플로 예약 솔루션
Google Cloud에서 워크플로를 예약합니다.
사용 사례
사용 사례
온프레미스에서 Google Cloud로 HDFS 데이터 마이그레이션
온프레미스 Hadoop 분산 파일 시스템(HDFS)에서 Google Cloud로 데이터를 이전하는 방법.
사용 사례
사용 사례
Apache Spark용 자바 및 Scala 종속 항목 관리
Dataproc 클러스터에 Spark 작업을 제출할 때 종속 항목을 포함하는 경우 권장되는 방법입니다.
코드 샘플
코드 샘플
Python API 샘플
Python에서 Dataproc API를 호출합니다.
코드 샘플
코드 샘플
자바 API 샘플
자바에서 Dataproc API를 호출합니다.
코드 샘플
코드 샘플
Node.js API 샘플
Node.js에서 Dataproc API를 호출합니다.
코드 샘플
코드 샘플
Go API 샘플
Go에서 Dataproc API를 호출합니다.
달리 명시되지 않는 한 이 페이지의 콘텐츠에는 Creative Commons Attribution 4.0 라이선스에 따라 라이선스가 부여되며, 코드 샘플에는 Apache 2.0 라이선스에 따라 라이선스가 부여됩니다. 자세한 내용은 Google Developers 사이트 정책을 참조하세요. 자바는 Oracle 및/또는 Oracle 계열사의 등록 상표입니다.
최종 업데이트: 2025-09-08(UTC)
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-08(UTC)"],[[["\u003cp\u003eDataproc is a managed service for Apache Spark and Apache Hadoop, enabling batch processing, querying, streaming, and machine learning with open-source data tools.\u003c/p\u003e\n"],["\u003cp\u003eDataproc automates cluster creation and management, helping users save time and money by allowing clusters to be turned off when not in use.\u003c/p\u003e\n"],["\u003cp\u003eDocumentation provides resources such as quickstarts, guides, references, and help for common issues.\u003c/p\u003e\n"],["\u003cp\u003eDataproc can be used on a variety of use cases such as workflow scheduling solutions, migrating data from on-premise, and dependency management.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation provides examples on how to call the Dataproc API in Python, Java, Node.js, and Go.\u003c/p\u003e\n"]]],[],null,["Dataproc documentation \n[Read product documentation](/dataproc/docs/concepts/overview) Dataproc \\| [Serverless for Apache Spark](/dataproc-serverless/docs \"View this page for Serverless for Apache Spark\") \\| [Dataproc Metastore](/dataproc-metastore/docs \"View this page for Dataproc Metastore\")\n\n\nDataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open\nsource data tools for batch processing, querying, streaming, and machine learning.\nDataproc automation helps you create clusters quickly, manage them easily, and save\nmoney by turning clusters off when you don't need them. With less time and money spent on\nadministration, you can focus on your jobs and your data.\n[Learn more](/dataproc/docs/concepts/overview)\n[Get started for free](https://console.cloud.google.com/freetrial) \n\nStart your proof of concept with $300 in free credit\n\n- Get access to Gemini 2.0 Flash Thinking\n- Free monthly usage of popular products, including AI APIs and BigQuery\n- No automatic charges, no commitment \n[View free product offers](/free/docs/free-cloud-features#free-tier) \n\nKeep exploring with 20+ always-free products\n\n\nAccess 20+ free products for common use cases, including AI APIs, VMs, data warehouses,\nand more.\n\nDocumentation resources \nFind quickstarts and guides, review key references, and get help with common issues. \nformat_list_numbered\n\nGuides\n\n-\n\n\n Quickstarts:\n [Console](/dataproc/docs/quickstarts/update-cluster-console),\n\n [Command-line](/dataproc/docs/quickstarts/update-cluster-gcloud),\n\n [Client Libraries](/dataproc/docs/quickstarts/create-cluster-client-libraries),\n\n [APIs Explorer---Create a cluster](/dataproc/docs/quickstarts/create-cluster-template),\n or\n [APIs Explorer---Submit a Spark job](/dataproc/docs/quickstarts/submit-sparks-job-template)\n\n\n-\n\n [Overview of Dataproc Workflow Templates](/dataproc/docs/concepts/workflows/overview)\n\n-\n\n [Dataproc on GKE Quickstart](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster)\n\n-\n\n [Configure Dataproc Hub](/dataproc/docs/tutorials/dataproc-hub-admins)\n\n-\n\n [Create a Dataproc Custom Image](/dataproc/docs/guides/dataproc-images)\n\n-\n\n [Write a MapReduce job with the BigQuery connector](/dataproc/docs/tutorials/bigquery-connector-mapreduce-example)\n\n-\n\n [Use the Cloud Storage connector with Apache Spark](/dataproc/docs/tutorials/gcs-connector-spark-tutorial)\n\nfind_in_page\n\nReference\n\n-\n\n [REST API](/dataproc/docs/reference/rest)\n\n-\n\n [RPC API](/dataproc/docs/reference/rpc)\n\n-\n\n [Dataproc Client Libraries](/dataproc/docs/reference/libraries)\n\n-\n\n [Dataproc \\& Cloud SDK](/dataproc/docs/gcloud-installation)\n\n-\n\n [Overview of APIs and Client Libraries](/dataproc/docs/api-libraries-overview)\n\ninfo\n\nResources\n\n-\n\n [Best practices](https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide)\n\n-\n\n [Pricing](/dataproc/pricing)\n\n-\n\n [Release notes](/dataproc/docs/release-notes)\n\n-\n\n [Diagnose Dataproc clusters](/dataproc/docs/support/diagnose-command)\n\n-\n\n [Dataproc Quotas](/dataproc/quotas)\n\n-\n\n [Get support](/dataproc/docs/support/getting-support)\n\n- \n\nRelated resources Training and tutorials \nUse cases \nCode samples \nExplore self-paced training, use cases, reference architectures, and code samples with examples of how to use and connect Google Cloud services. Training \nTraining and tutorials\n\nRun a Spark job on Google Kubernetes Engine\n\n\nSubmit Spark jobs to a running Google Kubernetes Engine cluster from the Dataproc Jobs API.\n\n\n[Learn more](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster) \nTraining \nTraining and tutorials\n\nIntroduction to Cloud Dataproc: Hadoop and Spark on Google Cloud\n\n\nThis course features a combination of lectures, demos, and hands-on labs to create a Dataproc cluster, submit a Spark job, and then shut down the cluster.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/672?parent=catalog) \nTraining \nTraining and tutorials\n\nMachine Learning with Spark on Dataproc\n\n\nThis course features a combination of lectures, demos, and hands-on labs to implement logistic regression using a machine learning library for Apache Spark running on a Dataproc cluster to develop a model for data from a multivariable dataset.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/3390?parent=catalog) \nUse case \nUse cases\n\nWorkflow scheduling solutions\n\n\nSchedule workflows on Google Cloud.\n\n\n[Learn more](/dataproc/docs/concepts/workflows/workflow-schedule-solutions) \nUse case \nUse cases\n\nMigrate HDFS Data from On-Premises to Google Cloud\n\n\nHow to move data from on-premises Hadoop Distributed File System (HDFS) to Google Cloud.\n\n\n[Learn more](/solutions/migration/hadoop/hadoop-gcp-migration-data) \nUse case \nUse cases\n\nManage Java and Scala dependencies for Apache Spark\n\n\nRecommended approaches to including dependencies when you submit a Spark job to a Dataproc cluster.\n\n\n[Learn more](/dataproc/docs/guides/manage-spark-dependencies) \nCode sample \nCode Samples\n\nPython API samples\n\n\nCall Dataproc APIs from Python.\n\n\n[Open GitHub\narrow_forward](https://github.com/googleapis/python-dataproc/tree/master/samples) \nCode sample \nCode Samples\n\nJava API samples\n\n\nCall Dataproc APIs from Java.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\nNode.js API samples\n\n\nCall Dataproc APIs from Node.js.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\nGo API samples\n\n\nCall Dataproc APIs from Go.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/golang-samples/tree/master/dataproc)\n\nRelated videos"]]