Report Builder Low-Level Design: Dynamic Query Generation, Parameterized Reports, and Scheduled Delivery

Report Builder Low-Level Design

A report builder lets non-engineers define, save, schedule, and share data reports without writing raw SQL. The key design challenges are safety (preventing SQL injection from user-defined queries), performance (running reports against production data without degrading it), and reliability (scheduled delivery that handles failures gracefully).

Report Definition Schema

  • report_id — UUID
  • name, description
  • query_template — parameterized SQL string with named placeholders
  • parameters_schema — JSON Schema defining parameter names, types, and defaults
  • output_formatcsv, html, pdf
  • owner_id, is_public, shared_with_team_ids[]
  • schedule — cron expression (e.g., 0 8 * * 1 for Monday 8 AM)
  • delivery_channels — array of {type: email|slack, destination}

Visual Query Builder

The UI exposes a point-and-click interface: select a table from a permitted list, drag in columns, add filter conditions, choose grouping and ordering. Internally, this constructs a parameterized SQL template. Advanced users can edit the SQL directly in a sandboxed editor. The UI never allows arbitrary SQL to be submitted without passing validation.

Parameterized SQL

User-supplied filter values are never string-interpolated into the query. All parameters use named placeholders resolved by the database driver:

SELECT order_id, total, status
FROM orders
WHERE created_at > :start_date
  AND status = :status
  AND customer_id = :customer_id
ORDER BY created_at DESC
LIMIT :limit

The query template is stored as-is. Parameter values are bound at execution time by the DB driver, which prevents SQL injection regardless of what values the user provides.

Query Validation

Before storing or running a query template, validate it server-side:

  • Parse the SQL with a SQL parser (e.g., pg_query for PostgreSQL) — reject malformed SQL
  • Allow only SELECT statements — reject INSERT, UPDATE, DELETE, DROP, CTEs that write
  • Check that all referenced tables and columns are in the permitted whitelist for the user's role
  • Reject subqueries that reference restricted tables

Execution

Run all report queries against a read replica or a dedicated analytics database — never the primary write database. Apply a hard query timeout (30 seconds default, configurable per report up to 5 minutes for scheduled reports). If the query exceeds the timeout, the execution is killed and the run is marked TIMED_OUT. Alert the report owner on repeated timeouts.

Result Caching

Cache query results in Redis keyed by SHA-256({report_id}:{serialized_parameters}). TTL is set per report based on data freshness requirements:

  • Reference/config data: 1 hour
  • Business metrics: 5–15 minutes
  • Real-time operational data: no cache

Cached results serve repeated loads of the same report with the same parameters instantly, reducing replica load significantly for popular reports.

Report Rendering

  • CSV: stream rows directly from query result to response; no full dataset in memory
  • HTML table: paginate at 500 rows per page for browser rendering; include total row count
  • PDF: render the HTML table in headless Chrome (Puppeteer), capture the page as PDF; suitable for formatted delivery via email

Scheduled Delivery

A scheduler scans for reports whose cron expression matches the current time, enqueues a run job per report. The worker:

  1. Executes the query with default parameters
  2. Renders the output in the configured format
  3. Delivers to each configured channel (email attachment, Slack file upload)
  4. Logs the run with timestamp, duration, row count, delivery status

On delivery failure, retry with exponential backoff up to 3 attempts. After 3 failures, mark the scheduled run FAILED and notify the report owner.

Access Control and Versioning

  • Access control: report owner sets visibility — private (owner only), team (named teams), or org-public. Column whitelists are enforced at query validation time per role.
  • Versioning: editing a report creates a new version; previous versions are preserved. Each run in the execution history is linked to the version that produced it, so historical runs remain reproducible.

Large Result Handling and Execution History

If a query returns more than 100K rows, force the output format to CSV and disable HTML and PDF rendering — a 100K-row HTML table is unusable. Log each execution with: run_id, report_version, parameters, started_at, duration_ms, row_count, delivered_to[]. This history is visible to the report owner for auditing and debugging delivery issues.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Atlassian Interview Guide

Scroll to Top