Dataproc 是一項代管的 Apache Spark 和 Apache Hadoop 服務,能夠讓您妥善運用開放原始碼資料工具,進行批次處理、查詢、串流和機器學習作業。Dataproc 自動化功能可協助您快速建立叢集、輕鬆管理叢集,並在不需要叢集時關閉叢集來節省支出。省下管理作業所需的時間與費用之後,您就能專心處理工作與資料。瞭解詳情
使用價值 $300 美元的免費抵免額,開始進行概念驗證
-
取得 Gemini 2.0 Flash Thinking 的存取權
-
每月免費使用 AI API 和 BigQuery 等熱門產品
-
不會自動收費,也不會要求您一定要購買特定方案
繼續探索超過 20 項一律免費的產品
使用超過 20 項實用的免費產品,包括 AI API、VM 和 data warehouse 等。
訓練
訓練與教學課程
在 Google Kubernetes Engine 上執行 Spark 工作
透過 Dataproc Jobs API,將 Spark 工作提交至正在執行的 Google Kubernetes Engine 叢集。
訓練
訓練與教學課程
Cloud Dataproc 簡介:Google Cloud 中的 Hadoop 和 Spark
本課程結合了講座、示範和實作研究室,說明如何建立 Dataproc 叢集、提交 Spark 工作,然後關閉叢集。
訓練
訓練與教學課程
在 Dataproc 上使用 Spark 進行機器學習
本課程結合了講座、示範和實作實驗室,說明如何使用在 Dataproc 叢集上執行的 Apache Spark 機器學習程式庫,實作羅吉斯迴歸,並為多變數資料集開發模型。
用途
用途
工作流程排程解決方案
在 Google Cloud 上排定工作流程。
用途
用途
將 HDFS 資料從內部部署環境遷移至 Google Cloud
如何將資料從內部部署的 Hadoop 分散式檔案系統 (HDFS) 移至 Google Cloud。
用途
用途
管理 Apache Spark 的 Java 和 Scala 依附元件
向 Dataproc 叢集提交 Spark 工作時,建議採用下列方法納入依附元件。
程式碼範例
程式碼範例
Python API 範例
從 Python 呼叫 Dataproc API。
程式碼範例
程式碼範例
Java API 範例
從 Java 呼叫 Dataproc API。
程式碼範例
程式碼範例
Node.js API 範例
從 Node.js 呼叫 Dataproc API。
程式碼範例
程式碼範例
Go API 範例
從 Go 呼叫 Dataproc API。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-09-04 (世界標準時間)。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eDataproc is a managed service for Apache Spark and Apache Hadoop, enabling batch processing, querying, streaming, and machine learning with open-source data tools.\u003c/p\u003e\n"],["\u003cp\u003eDataproc automates cluster creation and management, helping users save time and money by allowing clusters to be turned off when not in use.\u003c/p\u003e\n"],["\u003cp\u003eDocumentation provides resources such as quickstarts, guides, references, and help for common issues.\u003c/p\u003e\n"],["\u003cp\u003eDataproc can be used on a variety of use cases such as workflow scheduling solutions, migrating data from on-premise, and dependency management.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation provides examples on how to call the Dataproc API in Python, Java, Node.js, and Go.\u003c/p\u003e\n"]]],[],null,["Dataproc documentation \n[Read product documentation](/dataproc/docs/concepts/overview) Dataproc \\| [Serverless for Apache Spark](/dataproc-serverless/docs \"View this page for Serverless for Apache Spark\") \\| [Dataproc Metastore](/dataproc-metastore/docs \"View this page for Dataproc Metastore\")\n\n\nDataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open\nsource data tools for batch processing, querying, streaming, and machine learning.\nDataproc automation helps you create clusters quickly, manage them easily, and save\nmoney by turning clusters off when you don't need them. With less time and money spent on\nadministration, you can focus on your jobs and your data.\n[Learn more](/dataproc/docs/concepts/overview)\n[Get started for free](https://console.cloud.google.com/freetrial) \n\nStart your proof of concept with $300 in free credit\n\n- Get access to Gemini 2.0 Flash Thinking\n- Free monthly usage of popular products, including AI APIs and BigQuery\n- No automatic charges, no commitment \n[View free product offers](/free/docs/free-cloud-features#free-tier) \n\nKeep exploring with 20+ always-free products\n\n\nAccess 20+ free products for common use cases, including AI APIs, VMs, data warehouses,\nand more.\n\nDocumentation resources \nFind quickstarts and guides, review key references, and get help with common issues. \nformat_list_numbered\n\nGuides\n\n-\n\n\n Quickstarts:\n [Console](/dataproc/docs/quickstarts/update-cluster-console),\n\n [Command-line](/dataproc/docs/quickstarts/update-cluster-gcloud),\n\n [Client Libraries](/dataproc/docs/quickstarts/create-cluster-client-libraries),\n\n [APIs Explorer---Create a cluster](/dataproc/docs/quickstarts/create-cluster-template),\n or\n [APIs Explorer---Submit a Spark job](/dataproc/docs/quickstarts/submit-sparks-job-template)\n\n\n-\n\n [Overview of Dataproc Workflow Templates](/dataproc/docs/concepts/workflows/overview)\n\n-\n\n [Dataproc on GKE Quickstart](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster)\n\n-\n\n [Configure Dataproc Hub](/dataproc/docs/tutorials/dataproc-hub-admins)\n\n-\n\n [Create a Dataproc Custom Image](/dataproc/docs/guides/dataproc-images)\n\n-\n\n [Write a MapReduce job with the BigQuery connector](/dataproc/docs/tutorials/bigquery-connector-mapreduce-example)\n\n-\n\n [Use the Cloud Storage connector with Apache Spark](/dataproc/docs/tutorials/gcs-connector-spark-tutorial)\n\nfind_in_page\n\nReference\n\n-\n\n [REST API](/dataproc/docs/reference/rest)\n\n-\n\n [RPC API](/dataproc/docs/reference/rpc)\n\n-\n\n [Dataproc Client Libraries](/dataproc/docs/reference/libraries)\n\n-\n\n [Dataproc \\& Cloud SDK](/dataproc/docs/gcloud-installation)\n\n-\n\n [Overview of APIs and Client Libraries](/dataproc/docs/api-libraries-overview)\n\ninfo\n\nResources\n\n-\n\n [Best practices](https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide)\n\n-\n\n [Pricing](/dataproc/pricing)\n\n-\n\n [Release notes](/dataproc/docs/release-notes)\n\n-\n\n [Diagnose Dataproc clusters](/dataproc/docs/support/diagnose-command)\n\n-\n\n [Dataproc Quotas](/dataproc/quotas)\n\n-\n\n [Get support](/dataproc/docs/support/getting-support)\n\n- \n\nRelated resources Training and tutorials \nUse cases \nCode samples \nExplore self-paced training, use cases, reference architectures, and code samples with examples of how to use and connect Google Cloud services. Training \nTraining and tutorials\n\nRun a Spark job on Google Kubernetes Engine\n\n\nSubmit Spark jobs to a running Google Kubernetes Engine cluster from the Dataproc Jobs API.\n\n\n[Learn more](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster) \nTraining \nTraining and tutorials\n\nIntroduction to Cloud Dataproc: Hadoop and Spark on Google Cloud\n\n\nThis course features a combination of lectures, demos, and hands-on labs to create a Dataproc cluster, submit a Spark job, and then shut down the cluster.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/672?parent=catalog) \nTraining \nTraining and tutorials\n\nMachine Learning with Spark on Dataproc\n\n\nThis course features a combination of lectures, demos, and hands-on labs to implement logistic regression using a machine learning library for Apache Spark running on a Dataproc cluster to develop a model for data from a multivariable dataset.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/3390?parent=catalog) \nUse case \nUse cases\n\nWorkflow scheduling solutions\n\n\nSchedule workflows on Google Cloud.\n\n\n[Learn more](/dataproc/docs/concepts/workflows/workflow-schedule-solutions) \nUse case \nUse cases\n\nMigrate HDFS Data from On-Premises to Google Cloud\n\n\nHow to move data from on-premises Hadoop Distributed File System (HDFS) to Google Cloud.\n\n\n[Learn more](/solutions/migration/hadoop/hadoop-gcp-migration-data) \nUse case \nUse cases\n\nManage Java and Scala dependencies for Apache Spark\n\n\nRecommended approaches to including dependencies when you submit a Spark job to a Dataproc cluster.\n\n\n[Learn more](/dataproc/docs/guides/manage-spark-dependencies) \nCode sample \nCode Samples\n\nPython API samples\n\n\nCall Dataproc APIs from Python.\n\n\n[Open GitHub\narrow_forward](https://github.com/googleapis/python-dataproc/tree/master/samples) \nCode sample \nCode Samples\n\nJava API samples\n\n\nCall Dataproc APIs from Java.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\nNode.js API samples\n\n\nCall Dataproc APIs from Node.js.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\nGo API samples\n\n\nCall Dataproc APIs from Go.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/golang-samples/tree/master/dataproc)\n\nRelated videos"]]