How do you guarantee exactly-once execution for scheduled jobs?

Before executing a job, the worker inserts a row into an executions table with a unique constraint on (job_id, scheduled_at). If two workers race for the same firing, the second insert violates the constraint and that worker aborts. Only the winner proceeds. The unique constraint acts as a distributed mutex without requiring a separate lock service.

How should a cron service tolerate clock skew between scheduler nodes?

Round the scheduled fire time down to the nearest scheduling tick (e.g., 1 minute) and use that canonical timestamp in the unique constraint rather than the wall-clock time observed by any individual node. Add a short look-ahead window—typically 1–2 ticks—so a node whose clock is slightly ahead doesn't create a duplicate firing that a slower-clocked node also tries to claim.

How does a missed-job recovery window work?

On startup (or periodically), the scheduler scans for jobs whose scheduled_at falls within a configurable recovery window in the past—say, the last 15 minutes—and have no corresponding execution row. It enqueues those missed firings immediately. The window must be shorter than the job's minimum acceptable delay and longer than the longest expected scheduler downtime to balance correctness against re-execution noise.

How does SKIP LOCKED enable safe concurrent job dispatching?

Workers issue SELECT ... FOR UPDATE SKIP LOCKED against the pending-jobs queue table. Each worker atomically locks and claims a row that no other session holds. Rows already locked by competing workers are skipped rather than blocking, so N workers can dequeue N distinct jobs in parallel without contention or deadlocks, giving near-linear throughput scaling.

Cron Service Low-Level Design: Distributed Scheduling, Exactly-Once Execution, and Missed Job Recovery

⏱ 5 min read

Requirements and Constraints

A distributed cron service triggers jobs on time-based schedules (standard cron expressions) across a fleet of nodes with exactly-once execution semantics. Functional requirements: register named cron jobs with a cron expression and a handler reference, fire each job at the correct time regardless of which node is leader, guarantee that each scheduled firing executes exactly once even during node failures or restarts, and recover missed firings that were skipped due to downtime.

Key constraints: the system must handle thousands of registered cron jobs, clock skew across nodes must not cause duplicate or missed firings, and the recovery window for missed jobs must be configurable per job type. Execution must be decoupled from scheduling — the cron service enqueues work, a job scheduler executes it.

Core Data Model

Cron Jobs Table

cron_job_id (UUID) — primary key
name (varchar, unique) — human-readable identifier
schedule (varchar) — standard cron expression, e.g., 0 */6 * * *
timezone (varchar) — IANA timezone for schedule evaluation
job_type (varchar) — handler registered in the job scheduler
payload_template (jsonb) — static parameters for the handler
max_missed_firings (int) — how many past-due firings to recover on startup
enabled (boolean)
last_scheduled_at (timestamptz) — the last firing time that was successfully enqueued

Firing Log Table

firing_id (UUID) — primary key
cron_job_id (UUID)
scheduled_at (timestamptz) — the nominal firing time per the cron schedule
enqueued_at (timestamptz) — actual time the job was enqueued
job_id (UUID) — FK to the job scheduler's jobs table
status (enum) — ENQUEUED, SUCCEEDED, FAILED

A unique index on (cron_job_id, scheduled_at) is the exactly-once enforcement mechanism — inserting a duplicate firing for the same nominal time fails with a unique constraint violation.

Key Algorithms and Logic

Leader-Based Scheduling

The cron service uses leader election (via distributed lock or Raft-based service) so that only one node evaluates schedules at any time. The leader runs a tick loop every 1 second:

For each enabled cron job, compute the next firing time after last_scheduled_at using the cron expression parser.
If next_firing_time <= NOW() + clock_skew_buffer (e.g., 5 seconds), attempt to insert a row into the firing log with scheduled_at = next_firing_time.
On successful insert (no unique conflict), enqueue a job into the job scheduler and update last_scheduled_at.
On unique conflict (another node already fired this interval), skip silently.

Clock Skew Handling

The leader fires jobs up to clock_skew_buffer seconds early to account for clock drift across nodes. The job scheduler's run_after field is set to the exact scheduled_at time, so the actual execution does not start early even if it was enqueued early. This separates the concerns of scheduling (when to enqueue) from execution timing (when to run).

Missed Job Recovery

On leader election or service restart, the recovery process:

For each enabled cron job, compute all firing times between last_scheduled_at and NOW() using the cron expression.
Limit recovery to the most recent max_missed_firings intervals (older missed firings are skipped to avoid overwhelming the job scheduler after a long outage).
Attempt to insert each missed firing into the firing log; conflicts are ignored (already processed by another node before the outage).
Successfully inserted firings are enqueued into the job scheduler with run_after = NOW() (immediate execution, since the scheduled time has passed).

Exactly-Once Guarantee

The unique index on (cron_job_id, scheduled_at) in the firing log ensures that even if two nodes simultaneously attempt to schedule the same firing (e.g., during a brief split-brain), only one insert succeeds. The insert and job enqueue are wrapped in a database transaction, so a partial failure (insert succeeds, enqueue fails) rolls back the insert and the next tick retries.

API Design

POST /cron-jobs — register a new cron job; body: { name, schedule, timezone, job_type, payload_template, max_missed_firings }.
PUT /cron-jobs/{name} — update schedule or payload template; changes take effect at the next tick.
DELETE /cron-jobs/{name} — disable and remove a cron job.
GET /cron-jobs/{name}/firings?from=&to= — paginated firing history with status of each execution.
POST /cron-jobs/{name}/trigger — manually trigger an immediate firing outside the normal schedule (does not insert a firing log row).

Scalability Considerations

Thousands of cron jobs: the tick loop iterates over all enabled jobs; index last_scheduled_at and filter to jobs whose next firing is within the next 60 seconds to avoid evaluating every job every second.
High-frequency schedules: for jobs scheduled more frequently than once per minute, verify that execution time is reliably less than the schedule interval, or use a dedicated worker pool with its own rate limit.
Multi-region: run the cron service leader in one region; other regions act as standby. Use a global distributed lock (e.g., across a 3-region Redis Sentinel cluster) for leader election. Failover completes within the election timeout.
Firing log size: archive firing log rows older than 90 days to cold storage; retain recent rows for recovery and audit.
Observability: alert on firing lag (difference between scheduled_at and enqueued_at exceeding 30 seconds), on missed firings exceeding the recovery limit, and on cron jobs with consistently failing executions.