Pub/Sub Lite with Dataproc

Pub/Sub Lite is a zonal, real-time messaging service that decouples services that produce events from services that process events. You can manually configure Pub/Sub Lite system throughput and storage capacity.

The Pub/Sub Lite Spark Connector supports Pub/Sub Lite as an input source to Apache Spark Structured Streaming in the default micro-batch processing and experimental continuous processing modes.

Using Pub/Sub Lite with Dataproc

Java

The samples directory in the java-pubsublite-spark repository on GitHub contains a Spark example in Java that uses Pub/Sub Lite with Dataproc. To run the example, follow the directions in the Spark example.

  1. To get started, clone the java-pubsublite-spark GitHub repository:
    git clone https://github.com/googleapis/java-pubsublite-spark
    cd java-pubsublite-spark/samples
    

Python / Scala

The connector is available from the Maven Central repository. You can download and provide it via the --packages option when using the spark-submit command or set it via the spark.jars.packages configuration property.

For more information

  • See the Pub/Sub Lite documentation.
  • Select the version of the Pub/Sub Lite Spark Connector here, then download its JAR on the linked page.