워크플로는 복잡한 작업 흐름에 이상적입니다. 작업의 종속 항목이 성공적으로 완료된 후에만 작업이 시작되도록 작업 종속 항목을 만들 수 있습니다.
워크플로 템플릿을 만들 때 Dataproc은 클러스터를 만들거나 클러스터에 작업을 제출하지 않습니다.
워크플로 템플릿이 인스턴스화될 때 Dataproc은 클러스터를 만들거나 선택하고 클러스터에서 워크플로 작업을 실행합니다.
워크플로 템플릿 종류
관리형 클러스터
워크플로 템플릿은 관리형 클러스터를 지정할 수 있습니다. 워크플로는 '임시' 클러스터를 만들어 워크플로 작업을 실행하고, 워크플로가 완료되면 이 클러스터를 삭제합니다.
클러스터 선택기
워크플로 템플릿은 이전에 클러스터에 연결된 사용자 라벨을 1개 이상 지정함으로써 워크플로 작업을 실행할 기존 클러스터를 지정할 수 있습니다. 워크플로는 모든 라벨과 일치하는 클러스터에서 실행됩니다. 여러 클러스터가 모든 라벨과 일치하는 경우 Dataproc은 사용 가능한 YARN 메모리가 가장 많은 클러스터를 선택하여 모든 워크플로 작업을 실행합니다. 워크플로가 끝났을 때 Dataproc은 선택한 클러스터를 삭제하지 않습니다. 자세한 내용은 워크플로에 클러스터 선택기 사용을 참조하세요.
매개변수화됨
서로 다른 값을 사용해서 워크플로 템플릿을 여러 번 실행할 경우에는 실행할 때마다 워크플로 템플릿을 수정할 필요가 없도록 파라미터를 사용합니다.
Transactional fire-and-forget API 상호작용 모델. 워크플로 템플릿은 다음을 비롯하여 일반적인 흐름에 포함된 단계를 대체합니다.
클러스터 만들기
작업 제출
폴링
클러스터 삭제
워크플로 템플릿은 단일 토큰을 사용하여 클러스터 만들기부터 삭제까지의 진행 상황을 추적하며 오류 처리 및 복구를 자동화합니다. 또한 Dataproc을 Cloud Run Functions 및 Cloud Composer와 같은 다른 도구와의 통합을 간소화합니다.
임시 및 장기 클러스터 지원. Apache Hadoop을 실행할 때 흔히 겪는 어려움은 클러스터를 미세 조정하고 크기를 알맞게 조정하는 것입니다.
임시(관리형) 클러스터는 단일 작업 부하를 실행하므로 더 쉽게 구성될 수 있습니다. 클러스터 선택기를 장기 클러스터와 함께 사용하면 클러스터를 만들고 삭제하는 데 드는 상각비용 없이 동일한 워크로드를 반복적으로 실행할 수 있습니다.
세분화된 IAM 보안. Dataproc 클러스터를 만들고 작업을 제출하려면 전체 IAM 권한이 필요하거나 전혀 필요하지 않습니다.
워크플로 템플릿은 템플릿별 workflowTemplates.instantiate 권한을 사용하며 클러스터 또는 작업 권한에 종속되지 않습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-08-27(UTC)"],[[["\u003cp\u003eWorkflow Templates offer a reusable configuration for defining a series of jobs in a Directed Acyclic Graph (DAG), streamlining the management and execution of workflows.\u003c/p\u003e\n"],["\u003cp\u003eInstantiating a Workflow Template initiates a Workflow, which either creates an ephemeral cluster, runs the jobs, and then deletes the cluster, or utilizes a pre-existing cluster selected via labels.\u003c/p\u003e\n"],["\u003cp\u003eWorkflows are ideal for complex job sequences, allowing you to set job dependencies so that one job will only execute once the previous one has been completed successfully.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow Templates can be parameterized to execute with varying values without the need to edit the template for each run, enhancing flexibility.\u003c/p\u003e\n"],["\u003cp\u003eWorkflow Templates simplify task automation and the integration of Dataproc with external tools by replacing manual cluster management steps with a single-token tracking process.\u003c/p\u003e\n"]]],[],null,["The Dataproc [WorkflowTemplates API](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates) provides a\nflexible and easy-to-use mechanism for managing and executing workflows. A\nWorkflow Template is a reusable workflow configuration. It defines a graph of\njobs with information on where to run those jobs.\n\n**Key Points:**\n\n- [Instantiating a Workflow Template](/dataproc/docs/concepts/workflows/using-workflows#running_a_workflow) launches a Workflow. A Workflow is an operation that runs a [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of jobs on a cluster.\n - If the workflow uses a [managed cluster](#managed_cluster), it creates the cluster, runs the jobs, and then deletes the cluster when the jobs are finished.\n - If the workflow uses a [cluster selector](#cluster_selector), it runs jobs on a selected existing cluster.\n- Workflows are ideal for complex job flows. You can create job dependencies so that a job starts only after its dependencies complete successfully.\n- When you [create a workflow template](/dataproc/docs/concepts/workflows/using-workflows#creating_a_template) Dataproc does not create a cluster or submit jobs to a cluster. Dataproc creates or selects a cluster and runs workflow jobs on the cluster when a workflow template is **instantiated**.\n\nKinds of Workflow Templates\n\nManaged cluster\n\nA workflow template can specify a managed cluster. The workflow will create an\n\"ephemeral\" cluster to run workflow jobs, and then delete the cluster when the\nworkflow is finished.\n\nCluster selector\n\nA workflow template can specify an existing cluster on which to run workflow\njobs by specifying one or more [user labels](/dataproc/docs/concepts/labels)\npreviously attached to the cluster. The workflow will run on a\ncluster that matches all of the labels. If multiple clusters match\nall labels, Dataproc selects the cluster with the most\nYARN available memory to run all workflow jobs. At the end of workflow,\nDataproc does not delete the selected cluster. See\n[Use cluster selectors with workflows](/dataproc/docs/concepts/workflows/cluster-selectors)\nfor more information.\n| A workflow can select a specific cluster by matching the `goog-dataproc-cluster-name` label (see [Using Automatically Applied Labels](/dataproc/docs/concepts/workflows/cluster-selectors#using_automatically_applied_labels)).\n\nParameterized\n\nIf you will run a workflow template multiple times with different values, use\nparameters to avoid editing the workflow template for each run:\n\n1. define parameters in the template, then\n\n2. pass different values for the parameters for each run.\n\nSee\n[Parameterization of Workflow Templates](/dataproc/docs/concepts/workflows/workflow-parameters)\nfor more information.\n\nInline\n\nWorkflows can be instantiated inline using the `gcloud` command with\n[workflow template YAML files](/dataproc/docs/concepts/workflows/using-yamls#instantiate_a_workflow_using_a_yaml_file) or by calling the Dataproc\n[InstantiateInline](/dataproc/docs/reference/rest/v1/projects.regions.workflowTemplates/instantiateInline)\nAPI (see [Using inline Dataproc workflows](/dataproc/docs/concepts/workflows/inline-workflows)).\nInline workflows do not create or modify workflow template resources.\n| Inline workflows can be useful for rapid prototyping or automation.\n\nWorkflow Template use cases\n\n- **Automation of repetitive tasks.** Workflows encapsulate frequently used\n cluster configurations and jobs.\n\n- **Transactional fire-and-forget API interaction model.** Workflow Templates\n replace the steps involved in a typical flow, which include:\n\n 1. creating the cluster\n 2. submitting jobs\n 3. polling\n 4. deleting the cluster\n\n Workflow Templates use a single token to track progress from cluster creation\n to deletion, and automate error handling and recovery. They also simplify the\n integration of Dataproc with other tools, such as Cloud Run functions\n and Cloud Composer.\n- **Support for ephemeral and long-lived clusters.** A common complexity\n associated with running Apache Hadoop is tuning and right-sizing clusters.\n Ephemeral (managed) clusters are easier to configure since they run a\n single workload. Cluster selectors can be used with\n longer-lived clusters to repeatedly execute the same workload\n without incurring the amortized cost of creating and deleting clusters.\n\n- **Granular IAM security.** Creating Dataproc clusters and\n submitting jobs require all-or-nothing IAM permissions.\n Workflow Templates use a per-template\n [workflowTemplates.instantiate](/dataproc/docs/concepts/iam/iam#workflow_templates_methods_required_permissions)\n permission, and do not depend on cluster or job permissions."]]