Read from Pub/Sub Lite from Spark (streaming)

Read messages from a Pub/Sub Lite subscription from a Spark cluster in the streaming mode.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

Python

To authenticate to Pub/Sub Lite, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from pyspark.sql import SparkSession
from pyspark.sql.types import StringType

# TODO(developer):
# project_number = 11223344556677
# location = "us-central1-a"
# subscription_id = "your-subscription-id"

spark = SparkSession.builder.appName("read-app").master("yarn").getOrCreate()

sdf = (
    spark.readStream.format("pubsublite")
    .option(
        "pubsublite.subscription",
        f"projects/{project_number}/locations/{location}/subscriptions/{subscription_id}",
    )
    .load()
)

sdf = sdf.withColumn("data", sdf.data.cast(StringType()))

query = (
    sdf.writeStream.format("console")
    .outputMode("append")
    .trigger(processingTime="1 second")
    .start()
)

# Wait 120 seconds (must be >= 60 seconds) to start receiving messages.
query.awaitTermination(120)
query.stop()

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.