Overview
A content scheduler lets authors queue posts, stories, social updates, or any content entity for automatic publishing or unpublishing at a future time. The system must handle timezone-correct scheduling, survive worker restarts without double-publishing, support recurring patterns (every Monday, first of month), and catch up on missed schedules when workers come back online after downtime. This LLD covers the data model, scheduler worker mechanics, timezone handling, recurrence rule processing, missed-schedule catch-up, and the key design decisions that keep it reliable in production.
Core Data Model
-- One row per scheduled action on a content entity
CREATE TABLE ScheduledContent (
schedule_id BIGSERIAL PRIMARY KEY,
content_id UUID NOT NULL, -- the ID of the content entity
entity_type VARCHAR(64) NOT NULL, -- 'post', 'story', 'product', 'banner'
entity_id BIGINT NOT NULL, -- FK into the relevant table
action VARCHAR(32) NOT NULL, -- 'publish', 'unpublish', 'archive'
scheduled_at_utc TIMESTAMPTZ NOT NULL, -- when to execute (always UTC)
timezone VARCHAR(64) NOT NULL, -- user's timezone at time of scheduling
local_time_display VARCHAR(32), -- e.g. "2025-06-01 09:00" for display
status VARCHAR(16) NOT NULL DEFAULT 'pending',
-- pending -> processing -> done | failed | skipped
recurrence_rule TEXT, -- RRULE string, NULL for one-off
next_schedule_id BIGINT REFERENCES ScheduledContent(schedule_id),
retry_count SMALLINT NOT NULL DEFAULT 0,
max_retries SMALLINT NOT NULL DEFAULT 3,
last_error TEXT,
created_by BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
executed_at TIMESTAMPTZ
);
CREATE INDEX idx_sc_pending ON ScheduledContent(scheduled_at_utc)
WHERE status = 'pending';
CREATE INDEX idx_sc_entity ON ScheduledContent(entity_type, entity_id, status);
-- Execution audit log — every run of a scheduled action
CREATE TABLE ScheduledContentLog (
log_id BIGSERIAL PRIMARY KEY,
schedule_id BIGINT NOT NULL REFERENCES ScheduledContent(schedule_id),
entity_type VARCHAR(64) NOT NULL,
entity_id BIGINT NOT NULL,
action VARCHAR(32) NOT NULL,
status VARCHAR(16) NOT NULL, -- 'success', 'failure', 'skipped'
execution_started TIMESTAMPTZ NOT NULL,
execution_finished TIMESTAMPTZ,
worker_id VARCHAR(64), -- hostname:pid of executing worker
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Scheduler Worker
Workers poll the ScheduledContent table for due rows. SKIP LOCKED is critical — it lets multiple workers run in parallel without competing for the same rows.
import os
import time
import socket
import psycopg2
from datetime import datetime, timezone, timedelta
WORKER_ID = f"{socket.gethostname()}:{os.getpid()}"
POLL_INTERVAL = 5 # seconds between polls
BATCH_SIZE = 20 # rows per poll cycle
CATCH_UP_WINDOW = timedelta(hours=24) # how far back to catch up missed schedules
def scheduler_worker_loop(db_conn):
print(f"Scheduler worker started: {WORKER_ID}")
while True:
try:
process_due_schedules(db_conn)
except Exception as e:
print(f"Worker error: {e}")
db_conn.rollback()
time.sleep(POLL_INTERVAL)
def process_due_schedules(db_conn):
now = datetime.now(timezone.utc)
catchup_cutoff = now - CATCH_UP_WINDOW
with db_conn.cursor() as cur:
# SKIP LOCKED: skip rows being processed by other workers
cur.execute("""
SELECT schedule_id, entity_type, entity_id, action,
scheduled_at_utc, recurrence_rule, retry_count, max_retries
FROM ScheduledContent
WHERE status = 'pending'
AND scheduled_at_utc = %s -- don't catch up schedules older than window
ORDER BY scheduled_at_utc ASC
LIMIT %s
FOR UPDATE SKIP LOCKED
""", (now, catchup_cutoff, BATCH_SIZE))
rows = cur.fetchall()
if not rows:
return
# Mark all fetched rows as 'processing' atomically
schedule_ids = [r[0] for r in rows]
cur.execute("""
UPDATE ScheduledContent
SET status = 'processing'
WHERE schedule_id = ANY(%s)
""", (schedule_ids,))
db_conn.commit()
for row in rows:
execute_schedule(db_conn, *row)
def execute_schedule(db_conn, schedule_id, entity_type, entity_id, action,
scheduled_at_utc, recurrence_rule, retry_count, max_retries):
started_at = datetime.now(timezone.utc)
log_id = None
with db_conn.cursor() as cur:
cur.execute("""
INSERT INTO ScheduledContentLog
(schedule_id, entity_type, entity_id, action, status,
execution_started, worker_id)
VALUES (%s, %s, %s, %s, 'running', %s, %s)
RETURNING log_id
""", (schedule_id, entity_type, entity_id, action, started_at, WORKER_ID))
log_id = cur.fetchone()[0]
db_conn.commit()
try:
perform_action(entity_type, entity_id, action)
with db_conn.cursor() as cur:
cur.execute("""
UPDATE ScheduledContent
SET status='done', executed_at=NOW()
WHERE schedule_id=%s
""", (schedule_id,))
cur.execute("""
UPDATE ScheduledContentLog
SET status='success', execution_finished=NOW()
WHERE log_id=%s
""", (log_id,))
db_conn.commit()
# Schedule next recurrence if applicable
if recurrence_rule:
create_next_recurrence(db_conn, schedule_id, recurrence_rule, scheduled_at_utc)
except Exception as e:
handle_failure(db_conn, schedule_id, log_id, str(e), retry_count, max_retries)
def perform_action(entity_type, entity_id, action):
"""
Idempotent content action. Safe to call multiple times — publish on
an already-published entity is a no-op at the application layer.
"""
if action == 'publish':
content_service.publish(entity_type, entity_id)
elif action == 'unpublish':
content_service.unpublish(entity_type, entity_id)
elif action == 'archive':
content_service.archive(entity_type, entity_id)
else:
raise ValueError(f"Unknown action: {action}")
def handle_failure(db_conn, schedule_id, log_id, error_msg, retry_count, max_retries):
new_retry_count = retry_count + 1
if new_retry_count <= max_retries:
# Exponential back-off: retry after 2^retry_count minutes
retry_delay = timedelta(minutes=2 ** retry_count)
retry_at = datetime.now(timezone.utc) + retry_delay
new_status = 'pending'
else:
retry_at = None
new_status = 'failed'
with db_conn.cursor() as cur:
cur.execute("""
UPDATE ScheduledContent
SET status=%s,
retry_count=%s,
last_error=%s,
scheduled_at_utc=COALESCE(%s, scheduled_at_utc)
WHERE schedule_id=%s
""", (new_status, new_retry_count, error_msg, retry_at, schedule_id))
cur.execute("""
UPDATE ScheduledContentLog
SET status='failure', execution_finished=NOW(), error_message=%s
WHERE log_id=%s
""", (error_msg, log_id))
db_conn.commit()
Timezone Handling
All scheduled_at_utc values are stored in UTC. Timezone conversion happens at schedule creation time — never during execution.
import pytz
from datetime import datetime
def create_scheduled_action(db_conn, entity_type, entity_id, action,
local_datetime_str, user_timezone, recurrence_rule=None, created_by=None):
"""
Accept a naive local datetime string from the UI (e.g. '2025-06-01 09:00')
and convert to UTC for storage. Display string preserved for UI accuracy.
"""
tz = pytz.timezone(user_timezone)
# Parse the naive datetime and localize to user's timezone
naive_dt = datetime.strptime(local_datetime_str, '%Y-%m-%d %H:%M')
local_dt = tz.localize(naive_dt, is_dst=None) # is_dst=None raises on ambiguous DST times
utc_dt = local_dt.astimezone(pytz.utc)
with db_conn.cursor() as cur:
cur.execute("""
INSERT INTO ScheduledContent
(content_id, entity_type, entity_id, action,
scheduled_at_utc, timezone, local_time_display,
recurrence_rule, status, created_by)
VALUES (gen_random_uuid(), %s, %s, %s, %s, %s, %s, %s, 'pending', %s)
RETURNING schedule_id
""", (entity_type, entity_id, action, utc_dt, user_timezone,
local_datetime_str, recurrence_rule, created_by))
db_conn.commit()
# Note on DST: is_dst=None causes pytz to raise pytz.exceptions.AmbiguousTimeError
# for clocks-fall-back ambiguous times. Surface this error to the user and ask them
# to select the pre- or post-DST interpretation explicitly.
Recurring Rules (RRULE)
Recurring schedules use the RFC 5545 RRULE format, processed by the rrule Python library. After each successful execution, the worker creates the next occurrence.
from dateutil.rrule import rrulestr
import pytz
def create_next_recurrence(db_conn, parent_schedule_id, rrule_string, last_scheduled_utc):
"""
Compute the next occurrence after last_scheduled_utc using the RRULE.
Insert a new ScheduledContent row for the next firing.
Example RRULE strings:
FREQ=DAILY — every day
FREQ=WEEKLY;BYDAY=MO,WE,FR — Mon/Wed/Fri
FREQ=MONTHLY;BYDAY=1MO — first Monday of month
FREQ=WEEKLY;INTERVAL=2;BYDAY=TU — every other Tuesday
"""
with db_conn.cursor() as cur:
cur.execute("""
SELECT entity_type, entity_id, action, timezone,
local_time_display, recurrence_rule, created_by
FROM ScheduledContent WHERE schedule_id=%s
""", (parent_schedule_id,))
parent = cur.fetchone()
entity_type, entity_id, action, user_tz, local_time_display, recurrence_rule, created_by = parent
# Evaluate rrule starting after the last execution to find the next date
rule = rrulestr(rrule_string, dtstart=last_scheduled_utc)
next_utc_dt = rule.after(last_scheduled_utc)
if next_utc_dt is None:
return # RRULE has no more occurrences (e.g. COUNT or UNTIL exhausted)
# Preserve the original local hour in the user's timezone for display
tz = pytz.timezone(user_tz)
next_local = next_utc_dt.astimezone(tz)
next_local_display = next_local.strftime('%Y-%m-%d %H:%M')
with db_conn.cursor() as cur:
cur.execute("""
INSERT INTO ScheduledContent
(content_id, entity_type, entity_id, action,
scheduled_at_utc, timezone, local_time_display,
recurrence_rule, status, created_by)
VALUES (gen_random_uuid(), %s, %s, %s, %s, %s, %s, %s, 'pending', %s)
RETURNING schedule_id
""", (entity_type, entity_id, action, next_utc_dt, user_tz,
next_local_display, recurrence_rule, created_by))
new_schedule_id = cur.fetchone()[0]
# Link parent to child for audit trail
cur.execute("""
UPDATE ScheduledContent SET next_schedule_id=%s WHERE schedule_id=%s
""", (new_schedule_id, parent_schedule_id))
db_conn.commit()
Catch-Up for Missed Schedules
Workers use a catch-up window (CATCH_UP_WINDOW = 24h) to process schedules that fired while the system was down. Schedules older than the window are marked skipped to avoid publishing stale content.
-- Mark very old pending schedules as skipped (run as a maintenance job)
UPDATE ScheduledContent
SET status = 'skipped',
last_error = 'Schedule missed and outside catch-up window'
WHERE status = 'pending'
AND scheduled_at_utc < NOW() - INTERVAL '24 hours';
Key Design Decisions
- UTC storage prevents DST bugs: Storing
scheduled_at_utcin UTC means a job scheduled for “9 AM New York” on the day clocks spring forward still fires at the correct wall-clock time. Thepytz.localize(is_dst=None)call catches ambiguous fall-back times at creation and forces the user to resolve the ambiguity explicitly, rather than silently choosing the wrong UTC offset. - RRULE standard for recurrence: RFC 5545 RRULE syntax (used by iCalendar, Google Calendar, and Outlook) covers every recurrence pattern needed — daily, weekly on specific days, nth weekday of month, with COUNT/UNTIL termination — without inventing a custom DSL. The
dateutil.rrulelibrary provides battle-tested evaluation. Storing the raw RRULE string means the pattern is portable and auditable. - SKIP LOCKED for parallel workers: Multiple scheduler worker processes can run simultaneously without coordination overhead.
SELECT ... FOR UPDATE SKIP LOCKEDmakes each worker atomically claim its own set of rows and skip rows already claimed by another worker. This is more efficient and simpler than a Redis-based distributed lock per schedule row. - Idempotent publish action: The
perform_actionfunction is designed so that callingpublishon an already-published entity is a safe no-op. This means if a worker crashes after executing the action but before writingstatus='done', and another worker retries the row, the content is not double-published or corrupted. Idempotency is the cheapest form of exactly-once semantics.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does SKIP LOCKED enable concurrent scheduler workers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Multiple workers query for due items using SELECT FOR UPDATE SKIP LOCKED; each worker atomically claims rows others have locked, preventing double-processing.”
}
},
{
“@type”: “Question”,
“name”: “How are recurring schedules represented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “RRULE strings (RFC 5545) are stored per scheduled item; dateutil.rrule parses them to compute the next occurrence.”
}
},
{
“@type”: “Question”,
“name”: “How is DST ambiguity handled during scheduling?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “pytz's localize() with is_dst=False resolves ambiguous wall-clock times (e.g., 1:30 AM during fall-back) to the standard-time interpretation.”
}
},
{
“@type”: “Question”,
“name”: “What is the catch-up window?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “If the scheduler is down, it only processes items up to N minutes overdue; older missed items are skipped and rescheduled forward to prevent thundering-herd catch-up.”
}
}
]
}
See also: Atlassian Interview Guide
See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale