Apache Hadoop
Apache Hive
-
Stream a Kafka topic to Hive
Use a Dataproc cluster to stream a Kafka topic into Apache Hive tables in Cloud Storage, and then query the streamed data.
-
Use Apache Hive on Dataproc
Learn how to deploy Apache Hive workloads efficiently on Dataproc.
Apache Kafka
Apache Spark
-
Monte Carlo methods using Dataproc and Apache Spark
Run Monte Carlo simulations in Python and Scala with Dataproc and Apache Spark.
-
Use BigQuery and Spark ML for machine learning
Use Dataproc, BigQuery, and Apache Spark ML for machine learning.
-
Use the BigQuery connector with Apache Spark
Follow example code that uses the BigQuery connector for Apache Hadoop with Apache Spark.
-
Use the Cloud Storage connector with Apache Spark
Follow example code that uses the Cloud Storage connector for Apache Hadoop with Apache Spark.
-
Write and run Spark Scala jobs
Create and submit Spark Scala jobs with Dataproc.
Connectors
-
Use the BigQuery connector with Apache Spark
Follow example code that uses the BigQuery connector for Apache Hadoop with Apache Spark.
-
Use the Cloud Storage connector with Apache Spark
Follow example code that uses the Cloud Storage connector for Apache Hadoop with Apache Spark.
-
Write a MapReduce job with the BigQuery connector
Follow example code that shows you how to write a MapReduce Job with the BigQuery connector for Apache Hadoop.
Languages
-
Configure Dataproc Python environment
Configure Python to run PySpark jobs on your Dataproc cluster.
-
Use the Cloud Client Libraries for Python
Use Cloud Client Libraries for Python APIs to programmatically interact with Dataproc.
-
Write and run Spark Scala jobs
Create and submit Spark Scala jobs with Dataproc.
Notebooks
-
Dataproc Hub overview
Understand Dataproc Hub basics.
-
Configure a Dataproc Hub
Configure Dataproc Hub to open the JupyterLab UI on single-user Dataproc clusters.
-
Use a Dataproc Hub
Use a Dataproc Hub instance to open the JupyterLab UI on a single-user Dataproc cluster.
-
Install and run a Jupyter notebook
Install, run, and access a Jupyter notebook on a Dataproc cluster.
-
Run Vertex AI Workbench notebooks on Dataproc clusters
Run the notebook file of a managed instance on a Dataproc cluster.
-
Run a genomics analysis in a JupyterLab notebook on Dataproc
Run a single-cell genomics analysis using Dask, NVIDIA RAPIDS, and GPUs on a JupyterLab notebook hosted on a Dataproc cluster.