提交 Pyspark 作业

将 Pyspark 作业提交到 Dataproc 集群。

代码示例

Python

在尝试此示例之前,请按照《Dataproc 快速入门:使用客户端库》中的 Python 设置说明进行操作。如需了解详情,请参阅 Dataproc Python API 参考文档

def submit_pyspark_job(dataproc, project, region, cluster_name, bucket_name, filename):
    """Submit the Pyspark job to the cluster (assumes `filename` was uploaded
    to `bucket_name."""
    job_details = {
        "placement": {"cluster_name": cluster_name},
        "pyspark_job": {
            "main_python_file_uri": "gs://{}/{}".format(bucket_name, filename)
        },
    }

    result = dataproc.submit_job(
        request={"project_id": project, "region": region, "job": job_details}
    )
    job_id = result.reference.job_id
    print("Submitted job ID {}.".format(job_id))
    return job_id

后续步骤

如需搜索和过滤其他 Google Cloud 产品的代码示例,请参阅 Google Cloud 示例浏览器