Sort Cloud Storage

An example PySpark job to sort the contents of a text file in Cloud Storage.

Code sample

Python

Before trying this sample, follow the Python setup instructions in the Dataproc quickstart using client libraries. For more information, see the Dataproc Python API reference documentation.

To authenticate to Dataproc, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import pyspark

sc = pyspark.SparkContext()
rdd = sc.textFile("gs://path-to-your-GCS-file")
print(sorted(rdd.collect()))

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.