What Is a Task Scheduling System?
A task scheduling system triggers jobs at specified times or on recurring schedules. Examples: Unix cron, Apache Airflow, AWS EventBridge Scheduler, Celery Beat. Core challenges: exactly-once execution (no job fires twice), reliability under scheduler crashes, distributed clock synchronization, and handling long-running vs short tasks differently.
System Requirements
Functional
- Schedule jobs: one-time (fire at T) and recurring (cron expression)
- Execute jobs reliably: at-least-once, with idempotency for exactly-once
- Monitor: job history, success/failure, execution duration
- Priority queues: urgent jobs run before lower-priority ones
- Dependencies: Job B starts only after Job A completes (DAG)
Non-Functional
- 100M scheduled jobs, 10K executions/second
- Job trigger latency: <1 second from scheduled time
- 99.99% reliability (no missed jobs)
Core Data Model
jobs: id, name, schedule(cron/timestamp), handler, payload,
status(active/paused/deleted), max_retries, timeout_seconds
executions: id, job_id, status(pending/running/success/failed),
started_at, completed_at, worker_id, attempt_number
Scheduling Architecture
Scheduler Service ──polls DB──► jobs due within next 30s
│
Acquires distributed lock (per job)
│
Creates execution record + publishes to Job Queue (Kafka/SQS)
│
Worker Pool ──consumes──► executes handler ──► updates execution status
Polling vs Event-Driven Trigger
Polling: the scheduler queries for jobs due in the next N seconds, runs every N/2 seconds. Simple, works for up to millions of jobs. At 100M jobs: partition the jobs table by next_run_time bucket; each scheduler instance owns a range. Event-driven (time-wheel or delay queue): store jobs in a sorted set (Redis ZADD with score = next_run_unix_timestamp). Poll Redis ZRANGEBYSCORE min max with max = now + 5s. Lower latency, handles high volumes efficiently.
# Redis time-wheel approach
ZADD scheduled_jobs {next_run_unix} job_id
# Scheduler loop (every 1 second):
due = redis.zrangebyscore('scheduled_jobs', 0, time.time() + 1)
for job_id in due:
redis.zrem('scheduled_jobs', job_id)
enqueue(job_id)
if recurring: redis.zadd('scheduled_jobs', next_time(job), job_id)
Exactly-Once Execution
The scheduler may crash after enqueuing a job but before recording it as enqueued — causing double execution on restart. Solution: use a distributed lock per job (Redis SET NX EX) before enqueuing. Only the scheduler holding the lock enqueues the job. Lock TTL = job execution timeout + buffer. If the worker crashes mid-execution: the execution record stays in “running” state. A watchdog process detects executions running longer than timeout_seconds and marks them as failed, releasing the lock for retry.
Cron Expression Parsing
* * * * * every minute
0 9 * * MON every Monday at 9am
0 */4 * * * every 4 hours
Parse cron expressions to compute next_run_time. Use a library (croniter in Python, cron-parser in Node). Store next_run_time in the DB; after execution, compute and update the next occurrence.
DAG Dependencies (Airflow-style)
For pipeline orchestration: model tasks as a DAG. Store edges in a task_dependencies table. Before executing a task, check all upstream tasks in the current run have status=success. A dependency resolver runs after each task completion and enqueues newly unblocked tasks.
Retry and Backoff
On task failure: increment attempt_number, compute next_retry_at = now + 2^attempt * base_delay (exponential backoff with jitter). Cap at max_retries. After max_retries: mark as permanently failed, send alert. Idempotent handlers make retries safe.
Interview Tips
- Redis sorted set (score = next_run_unix) is the canonical time-wheel data structure.
- Distributed lock per job prevents double-execution at the scheduler level.
- Separate scheduler (which decides when) from worker (which executes) for independent scaling.
- DAG dependency model is Airflow’s core — mention it for pipeline use cases.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a Redis sorted set work as a task scheduler queue?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A Redis sorted set (ZADD) stores tasks with their scheduled unix timestamp as the score. The scheduler polls with ZRANGEBYSCORE queue 0 {now} LIMIT 0 10 to fetch tasks due now or in the past. These are atomically removed with ZREM after pickup (use a Lua script for atomicity). This is called a time-wheel pattern: O(log N) insert, O(log N + K) fetch for K tasks. It handles millions of scheduled tasks efficiently. The scheduler runs in a leader node (chosen via distributed lock) to prevent double-execution. Tasks not yet due remain in the sorted set with future timestamps. For recurring tasks, after executing, the next occurrence is re-inserted with the next timestamp computed from the cron expression.” }
},
{
“@type”: “Question”,
“name”: “How do you guarantee exactly-once execution in a distributed task scheduler?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Exactly-once requires two components: (1) Leader election via distributed lock — only one scheduler node dequeues tasks at a time, preventing multiple nodes from picking up the same task. Use Redis SET NX EX or etcd leader lease. If the leader crashes, a new leader acquires the lock and resumes. (2) Idempotent task execution — even with exactly-once dequeue, a worker crash after dequeue but before completion can cause re-execution on retry. Tag each task with a unique execution_id. Workers write to an execution_log table with a UNIQUE constraint on execution_id. If the task is retried, the second INSERT fails (ON CONFLICT DO NOTHING), and the task is skipped. True exactly-once delivery is impossible across distributed systems without idempotent consumers; the combination of dedup + unique execution_id gives effectively-once semantics.” }
},
{
“@type”: “Question”,
“name”: “How do you implement task dependency DAGs in a scheduler like Airflow?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Task dependencies form a Directed Acyclic Graph (DAG). Each task has a list of upstream dependencies. A task is eligible to run only when all upstream tasks have completed successfully. Implementation: maintain a task_executions table with status (pending/running/success/failed) and a dependencies table linking child_task_id to parent_task_id. A scheduler loop queries: SELECT task_id FROM task_executions WHERE status=pending AND NOT EXISTS (SELECT 1 FROM dependencies d JOIN task_executions p ON d.parent_id=p.task_id WHERE d.child_id=task_id AND p.status != success). When a task completes, the scheduler checks which downstream tasks are now unblocked. For fan-out parallelism: once all predecessors succeed, all eligible successors can be enqueued simultaneously. Cycle detection at DAG definition time (topological sort) prevents deadlocks at runtime.” }
}
]
}