Workflow scheduling solutions

This section describes Google Cloud options you can use to schedule workflows.

Dataproc Workflow Templates

Dataproc Workflow templates provide a flexible and easy-to-use mechanism for managing and executing workflows. A Workflow Template is a reusable workflow configuration. It defines a graph of jobs with information on where to run those jobs.

Cloud Scheduler

Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, and Cloud infrastructure operations. It provides simple time-based scheduling, for example, daily or hourly, without requiring you to write code.

Advantages:

  • Enables time-based instantiation of workflow templates based on familiar cron expressions

  • No code to write

Tutorial: Workflow using Cloud Scheduler

Cloud Functions

Cloud Functions is a lightweight compute solution you can use to create single-purpose, stand-alone functions that respond to Cloud events without the need to manage a server or runtime environment. You can use Cloud Functions to launch Workflows in response to Pub/Sub events or file changes in Cloud Storage. You can use Cloud Functions with Cloud Scheduler for workflows that require the calculation of time-based parameters.

Advantages:

  • Enables workflow instantiation in response to data events, such as new files in Cloud Storage or Pub/Sub events.

  • Minimal coding required using Dataproc Go, Node.js, or Python client libraries

  • Dynamically generate workflows and workflow parameters

Tutorial: Workflow using Cloud Functions

Cloud Composer

Cloud Composer is a managed Apache Airflow service you can use to create, schedule, monitor, and manage workflows.

Advantages:

  • Supports time- and event-based scheduling

  • Simplified calls to Dataproc using Operators

  • Dynamically generate workflows and workflow parameters

  • Build data flows that span multiple Google Cloud products

Tutorial: Workflow using Cloud Composer