从 Spark 写入 Pub/Sub 精简版(流式传输)
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
在流式模式下将消息从 Spark 集群写入 Pub/Sub 精简版主题。
深入探索
如需查看包含此代码示例的详细文档,请参阅以下内容:
代码示例
Python
如需向 Pub/Sub Lite 进行身份验证,请设置应用默认凭据。
如需了解详情,请参阅为本地开发环境设置身份验证。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[],[],null,["# Write to Pub/Sub Lite from Spark (streaming)\n\nWrite messages to a Pub/Sub Lite topic from a Spark cluster in the streaming mode.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Write Pub/Sub Lite messages by using Apache Spark](/pubsub/lite/docs/write-messages-apache-spark)\n\nCode sample\n-----------\n\n### Python\n\n\nTo authenticate to Pub/Sub Lite, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from pyspark.sql import SparkSession\n from pyspark.sql.functions import array, create_map, col, lit, when\n from pyspark.sql.types import BinaryType, StringType\n import uuid\n\n # TODO(developer):\n # project_number = 11223344556677\n # location = \"us-central1-a\"\n # topic_id = \"your-topic-id\"\n\n spark = SparkSession.builder.appName(\"write-app\").getOrCreate()\n\n # Create a RateStreamSource that generates consecutive numbers with timestamps:\n # |-- timestamp: timestamp (nullable = true)\n # |-- value: long (nullable = true)\n sdf = spark.readStream.format(\"rate\").option(\"rowsPerSecond\", 1).load()\n\n # Transform the dataframe to match the required data fields and data types:\n # https://github.com/googleapis/java-pubsublite-spark#data-schema\n sdf = (\n sdf.withColumn(\"key\", lit(\"example\").cast(BinaryType()))\n .withColumn(\"data\", col(\"value\").cast(StringType()).cast(BinaryType()))\n .withColumnRenamed(\"timestamp\", \"event_timestamp\")\n # Populate the attributes field. For example, an even value will\n # have {\"key1\", [b\"even\"]}.\n .withColumn(\n \"attributes\",\n create_map(\n lit(\"key1\"),\n array(when(col(\"value\") % 2 == 0, b\"even\").otherwise(b\"odd\")),\n ),\n )\n .drop(\"value\")\n )\n\n # After the transformation, the schema of the dataframe should look like:\n # |-- key: binary (nullable = false)\n # |-- data: binary (nullable = true)\n # |-- event_timestamp: timestamp (nullable = true)\n # |-- attributes: map (nullable = false)\n # | |-- key: string\n # | |-- value: array (valueContainsNull = false)\n # | | |-- element: binary (containsNull = false)\n sdf.printSchema()\n\n query = (\n sdf.writeStream.format(\"pubsublite\")\n .option(\n \"pubsublite.topic\",\n f\"projects/{project_number}/locations/{location}/topics/{topic_id}\",\n )\n # Required. Use a unique checkpoint location for each job.\n .option(\"checkpointLocation\", \"/tmp/app\" + uuid.uuid4().hex)\n .outputMode(\"append\")\n .trigger(processingTime=\"1 second\")\n .start()\n )\n\n # Wait 60 seconds to terminate the query.\n query.awaitTermination(60)\n query.stop()\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=pubsublite)."]]