Run cron jobs reliably on Compute Engine with Cloud Scheduler
Many systems have regularly scheduled jobs, but getting those job to run reliably in a distributed environment can be surprisingly hard.
Imagine trying to run the standard UNIX cron job scheduling service in a fleet of virtual machines. Many individual machines come and go due to autoscaling and network partitioning. As such, a critical task might never run because the instance it was scheduled on became unavailable. Alternately, a task that was meant to run only once might be duplicated by many servers as your autoscaler brings them online.
Using Cloud Scheduler for scheduling and Google Cloud Pub/Sub for messaging, you can build a distributed and fault-tolerant scheduler for your virtual machines. In this design pattern, you schedule your jobs in Cloud Scheduler. Cloud Scheduler uses Cloud Pub/Sub to relay the events to a utility running on each Compute Engine instance. When that utility receives a message, it runs a script corresponding to the Cloud Pub/Sub topic. The scripts run locally on the instance just as if they were run by cron. In fact, you can reuse existing cron scripts with this design pattern.
Using Cloud Pub/Sub for distributed messaging means that you can schedule an event to only run on one of many servers, or to run the task on several servers concurrently. Using this topic and subscriber model (shown in the diagram below) allows you to control which instances receive and perform a given task.
For a detailed explanation of this design pattern, check out our Reliable Task Scheduling for Google Compute Engine article, which includes a sample implementation on GitHub. Feel free to make pull requests or open issues directly on the open source sample.