Pub/Sub Lite is a real-time messaging service built for low cost and offers lower reliability compared to Pub/Sub. Pub/Sub Lite offers zonal and regional topics for storage.
The Pub/Sub Lite Spark Connector supports Pub/Sub Lite as an input source to Apache Spark Structured Streaming in the default micro-batch processing and experimental continuous processing modes.
Using Pub/Sub Lite with Dataproc
Java
The samples
directory in the java-pubsublite-spark
repository on
GitHub contains
a Spark example in Java that uses Pub/Sub Lite with
Dataproc. To run the example, follow the
directions in the Spark example.
- To get started, clone the
java-pubsublite-spark
GitHub repository:git clone https://github.com/googleapis/java-pubsublite-spark cd java-pubsublite-spark/samples
Python / Scala
The connector is available from the Maven Central repository.
You can download and provide it via the --packages
option when using the
spark-submit command or set it via the spark.jars.packages
configuration property.
For more information
- See Using Pub/Sub Lite with Apache Spark, a quickstart that runs a Python script on a Dataproc cluster to read and write data from and to Pub/Sub Lite.
- Select the version of the Pub/Sub Lite Spark Connector here, then download its JAR on the linked page.