Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Puoi definire un modello di workflow in un file YAML, quindi creare un'istanza del modello
per eseguire il workflow. Puoi anche importare ed esportare un file YAML del modello di workflow
per creare e aggiornare una risorsa del modello di workflow Dataproc.
Esegui un flusso di lavoro utilizzando un file YAML
Definisci il modello di flusso di lavoro in un file YAML. Il file YAML deve includere tutti i campi WorkflowTemplate obbligatori, ad eccezione del campo id, e deve escludere anche il campo version e tutti i campi di sola output.
Nell'esempio di workflow seguente, l'elenco prerequisiteStepIds nel passaggio terasort garantisce che il passaggio terasort inizi solo dopo il completamento del passaggio teragen.
Istanziare un flusso di lavoro utilizzando un file YAML con il posizionamento automatico delle zone di Dataproc
Definisci il modello di flusso di lavoro in un file YAML. Questo file YAML è uguale al
precedente, tranne per il fatto che il campo zoneUri è impostato sulla stringa vuota ('')
per consentire al
posizionamento automatico delle zone
di Dataproc di selezionare la zona per il cluster.
Importare ed esportare un file YAML del modello di workflow
Puoi importare ed esportare file YAML del modello di flusso di lavoro. In genere, un modello di workflow
viene prima esportato come file YAML, poi il file YAML viene modificato e infine
il file YAML modificato viene importato per aggiornare il modello.
Esporta il modello di flusso di lavoro
in un file YAML. Durante l'operazione di esportazione,
i campi id e version e tutti i campi di sola output
vengono filtrati dall'output e non vengono visualizzati nel
file YAML esportato.
Puoi passare il
WorkflowTemplateid o la risorsa modello completa name
("projects/PROJECT_ID/regions/REGION/workflowTemplates/TEMPLATE_ID") al comando.
Modifica il file YAML in locale. Tieni presente che i campi id, version e di solo output, che sono stati filtrati dal file YAML durante l'esportazione del modello, non sono consentiti nel file YAML importato.
Puoi passare il
WorkflowTemplateid o la risorsa modello completa name
("projects/PROJECT_ID/regions/region/workflowTemplates/TEMPLATE_ID") al comando. La risorsa modello con lo stesso nome verrà sovrascritta (aggiornata) e il relativo numero di versione verrà incrementato. Se non esiste un modello con lo stesso nome, verrà creato.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eYou can define workflow templates in YAML files and then instantiate them to run workflows, allowing for efficient workflow management.\u003c/p\u003e\n"],["\u003cp\u003eWorkflows can be run directly from a YAML file without creating a workflow template resource by using the \u003ccode\u003egcloud dataproc workflow-templates instantiate-from-file\u003c/code\u003e command.\u003c/p\u003e\n"],["\u003cp\u003eWhen defining a workflow template, you can set \u003ccode\u003eprerequisiteStepIds\u003c/code\u003e to specify dependencies between steps, ensuring they run in the correct order.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Auto Zone Placement can be used by setting the \u003ccode\u003ezoneUri\u003c/code\u003e field to an empty string in the workflow template YAML file, simplifying cluster zone selection.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow templates can be exported to YAML files, edited locally, and then imported to update existing templates using \u003ccode\u003egcloud dataproc workflow-templates export\u003c/code\u003e and \u003ccode\u003egcloud dataproc workflow-templates import\u003c/code\u003e commands.\u003c/p\u003e\n"]]],[],null,["You can define a workflow template in a YAML file, then instantiate the template\nto run the workflow. You can also import and export a workflow template YAML\nfile to create and update a Dataproc workflow template resource.\n| Also see [Using inline Dataproc workflows](/dataproc/docs/concepts/workflows/inline-workflows) for other ways to run a workflow without creating a workflow template resource.\n\nRun a workflow using a YAML file\n\nTo run a workflow without first creating a workflow template resource,\nuse the\n[gcloud dataproc workflow-templates instantiate-from-file](/sdk/gcloud/reference/dataproc/workflow-templates/instantiate-from-file)\ncommand.\n\n1. Define your workflow template in a YAML file. The YAML file must include all required [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates) fields except the `id` field, and it must also exclude the `version` field and all output-only fields. In the following workflow example, the `prerequisiteStepIds` list in the `terasort` step ensures the `terasort` step will only begin after the `teragen` step completes successfully. \n\n ```\n jobs:\n - hadoopJob:\n args:\n - teragen\n - '1000'\n - hdfs:///gen/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: teragen\n - hadoopJob:\n args:\n - terasort\n - hdfs:///gen/\n - hdfs:///sort/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: terasort\n prerequisiteStepIds:\n - teragen\n placement:\n managedCluster:\n clusterName: my-managed-cluster\n config:\n gceClusterConfig:\n zoneUri: us-central1-a\n ```\n2. Run the workflow: \n\n ```\n gcloud dataproc workflow-templates instantiate-from-file \\\n --file=TEMPLATE_YAML \\\n --region=REGION\n ```\n\nInstantiate a workflow using a YAML file with Dataproc Auto Zone Placement\n\n1. Define your workflow template in a YAML file. This YAML file is the same as the previous YAML file, except the `zoneUri` field is set to the empty string ('') to allow Dataproc [Auto Zone Placement](/dataproc/docs/concepts/configuring-clusters/auto-zone) to select the zone for the cluster. \n\n ```\n jobs:\n - hadoopJob:\n args:\n - teragen\n - '1000'\n - hdfs:///gen/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: teragen\n - hadoopJob:\n args:\n - terasort\n - hdfs:///gen/\n - hdfs:///sort/\n mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar\n stepId: terasort\n prerequisiteStepIds:\n - teragen\n placement:\n managedCluster:\n clusterName: my-managed-cluster\n config:\n gceClusterConfig:\n zoneUri: ''\n ```\n2. Run the workflow. When using Auto Placement, you must pass a [region](/dataproc/docs/concepts/regional-endpoints) to the `gcloud` command. \n\n ```\n gcloud dataproc workflow-templates instantiate-from-file \\\n --file=TEMPLATE_YAML \\\n --region=REGION\n ```\n\nImport and export a workflow template YAML file\n\nYou can import and export workflow template YAML files. Typically, a workflow\ntemplate is first exported as a YAML file, then the YAML is edited, and then\nthe edited YAML file is imported to update the template.\n\n1. [Export the workflow template](/sdk/gcloud/reference/dataproc/workflow-templates/export)\n to a YAML file. During the export operation,\n the `id` and `version` fields, and all output-only fields\n are filtered from the output and do not appear in the\n exported YAML file.\n\n ```\n gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \\\n --destination=TEMPLATE_YAML \\\n --region=REGION\n ```\n You can pass either the [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates#resource-workflowtemplate) `id` or the fully qualified template resource `name` (\"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/regions/\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e/workflowTemplates/\u003cvar translate=\"no\"\u003eTEMPLATE_ID\u003c/var\u003e\") to the command. If you omit the `--destination` flag, the output is directed to `stdout`, so the following command will also export the template to a YAML file: \n |\n | ```\n | gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \\\n | --region=REGION \u003e TEMPLATE_YAML\n |\n | ```\n\n \u003cbr /\u003e\n\n2. Edit the YAML file locally. Note that the `id`, `version`,\n and output-only fields, which were filtered\n from the YAML file when the template was exported, are disallowed in the\n imported YAML file.\n\n3. [Import the updated workflow template](/sdk/gcloud/reference/dataproc/workflow-templates/import)\n YAML file:\n\n ```\n gcloud dataproc workflow-templates import TEMPLATE_ID or TEMPLATE_NAME \\\n --source=TEMPLATE_YAML \\\n --region=REGION\n ```\n You can pass either the [WorkflowTemplate](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates#resource-workflowtemplate) `id` or the fully qualified template resource `name` (\"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/regions/\u003cvar translate=\"no\"\u003eregion\u003c/var\u003e/workflowTemplates/\u003cvar translate=\"no\"\u003eTEMPLATE_ID\u003c/var\u003e\") to the command. The template resource with the same template name will be overwritten (updated) and its version number will be incremented. If a template with the same template name does not exist, it will be created.\n\n \u003cbr /\u003e"]]