Using YAML files with workflows

You can define a workflow template in a YAML file, then instantiate the template to run the workflow. You can also import and export a workflow template YAML file to create and update a Cloud Dataproc workflow template resource.

Instantiate a workflow using a YAML file

To run a workflow without first creating a workflow template resource, use the gcloud dataproc workflow-templates instantiate-from-file command.

  1. Define your workflow template in a YAML file. The YAML file must include all required WorkflowTemplate fields except the id field, and it must also exclude the version field and all output-only fields. Here's an example of a single-job workflow:
    jobs:
    - hadoopJob:
        args:
        - teragen
        - '1000'
        - hdfs:///gen/
        mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
      stepId: teragen
    placement:
      managedCluster:
        clusterName: my-managed-cluster
        config:
          gceClusterConfig:
            zoneUri: us-central1-a
    
  2. Run the workflow:
    gcloud dataproc workflow-templates instantiate-from-file --file your-template.yaml
    

Import and export a workflow template YAML file

You can import and export workflow template YAML files. Typically, a workflow template is first exported as a YAML file, then the YAML is edited, and then the edited YAML file is imported to update the template.

  1. Export the workflow template to a YAML file. During the export operation, the id and version fields, and all output-only fields are filtered from the output and do not appear in the exported YAML file.
    gcloud dataproc workflow-templates export template-id or template-name 
    --destination template.yaml
    You can pass either the WorkflowTemplate id or the fully qualified template resource name ("projects/projectId/regions/region/workflowTemplates/template_id") to the command.
  2. Edit the YAML file locally. Note that the id, version, and output-only fields, which were filtered from the YAML file when the template was exported, are disallowed in the imported YAML file.
  3. Import the updated workflow template YAML file:
    gcloud dataproc workflow-templates import template-id or template-name 
    --source template.yaml
    You can pass either the WorkflowTemplate id or the fully qualified template resource name ("projects/projectId/regions/region/workflowTemplates/template_id") to the command. The template resource with the same template name will be overwritten (updated) and its version number will be incremented. If a template with the same template name does not exist, it will be created.
¿Te ha resultado útil esta página? Enviar comentarios:

Enviar comentarios sobre...

Cloud Dataproc Documentation
Si necesitas ayuda, visita nuestra página de asistencia.