Question 1

How do you safely generate dynamic SQL queries from user-defined report parameters without introducing SQL injection?

Accepted Answer

Never concatenate raw user input into SQL strings. Build queries programmatically using a query builder library (e.g., SQLAlchemy core, jOOQ, Knex) that enforces parameterized bindings for all values. Allow users to select from a whitelist of column names and table identifiers that are validated server-side against an allowed schema registry — never accept arbitrary column names as strings. Aggregate functions, GROUP BY targets, and ORDER BY columns must also come from the whitelist. The final SQL is constructed by the server from safe building blocks, with all user-supplied values passed as bind parameters.

Question 2

How would you design the data model for parameterized, reusable report templates?

Accepted Answer

Store report templates as rows in a report_templates table with columns: id, name, owner_id, base_query_json (structured representation of SELECT / FROM / WHERE / GROUP BY clauses), parameter_schema (JSON Schema defining accepted parameters and their types), and created_at. Parameters are placeholders in the query definition (e.g., {date_from}, {user_id}). At run time, validate the caller's supplied values against parameter_schema, substitute into the query structure, and execute. Report runs are stored in a report_runs table referencing the template ID, the resolved parameters, status, and output location.

Question 3

What caching strategy would you apply to expensive report queries that many users run with the same parameters?

Accepted Answer

Cache at the result level using a key derived from a hash of (template_id + canonical sorted parameters). Store results in Redis or S3 with a TTL matched to the data freshness requirement (e.g., 5 minutes for operational reports, 1 hour for daily summaries). On a cache hit, return the cached result immediately. On a miss, check whether an in-flight computation for the same key exists (using a distributed lock or a pending-jobs table) to prevent the thundering herd problem — additional requests wait for the in-flight result rather than launching duplicate queries. Invalidate proactively when underlying data changes if the system supports change-data-capture.

Question 4

How do you implement scheduled report delivery reliably, ensuring reports are delivered exactly once even if a worker crashes?

Accepted Answer

Use a durable scheduler (e.g., pg-cron, Quartz, or Temporal) to insert scheduled_report_runs rows at the configured cadence. A worker claims a run by atomically setting its status from PENDING to CLAIMED with a worker ID and a claim expiry timestamp (UPDATE ... WHERE status='PENDING' RETURNING id). If the worker crashes, a watchdog process resets CLAIMED rows whose expiry has passed back to PENDING. After successfully generating and delivering the report, the worker sets status to DELIVERED. Delivery receipts (e.g., email bounce/open events via webhook) are stored against the run record for auditability.

Report Builder Low-Level Design: Dynamic Query Generation, Parameterized Reports, and Scheduled Delivery

Report Builder Low-Level Design

Report Definition Schema

Visual Query Builder

Parameterized SQL

Query Validation

Execution

Result Caching

Report Rendering

Scheduled Delivery

Access Control and Versioning

Large Result Handling and Execution History