Pub/Sub Lite with Dataproc

Pub/Sub Lite is a real-time messaging service built for low cost and offers lower reliability compared to Pub/Sub. Pub/Sub Lite offers zonal and regional topics for storage.

The Pub/Sub Lite Spark Connector supports Pub/Sub Lite as an input source to Apache Spark Structured Streaming in the default micro-batch processing and experimental continuous processing modes.

Using Pub/Sub Lite with Dataproc

Java

The samples directory in the java-pubsublite-spark repository on GitHub contains a Spark example in Java that uses Pub/Sub Lite with Dataproc. To run the example, follow the directions in the Spark example.

  1. To get started, clone the java-pubsublite-spark GitHub repository:
    git clone https://github.com/googleapis/java-pubsublite-spark
    cd java-pubsublite-spark/samples
    

Python / Scala

The connector is available from the Maven Central repository. You can download and provide it via the --packages option when using the spark-submit command or set it via the spark.jars.packages configuration property.

For more information

  • See Using Pub/Sub Lite with Apache Spark, a quickstart that runs a Python script on a Dataproc cluster to read and write data from and to Pub/Sub Lite.
  • Select the version of the Pub/Sub Lite Spark Connector here, then download its JAR on the linked page.