How do you model a dynamic form schema that supports multiple question types?

Use a polymorphic schema with a base Question type and type-specific config stored as JSON. Questions table: question_id, form_id, order_index, type (TEXT, MULTIPLE_CHOICE, CHECKBOX, RATING, DATE, FILE_UPLOAD), required, title, description, config (JSON). Config examples: MULTIPLE_CHOICE: {"options": ["Yes","No","Maybe"], "randomize": true}. RATING: {"min": 1, "max": 5, "labels": {"1": "Poor", "5": "Excellent

Low-Level Design: Survey and Form Builder — Dynamic Forms, Response Collection, and Analytics

⏱ 6 min read

Core Entities

Survey: survey_id, owner_id, title, description, status (DRAFT, ACTIVE, CLOSED), settings (allow_multiple_responses, anonymous, show_progress_bar), created_at, closes_at. Question: question_id, survey_id, order_index, type (MULTIPLE_CHOICE, CHECKBOX, SHORT_TEXT, LONG_TEXT, RATING, SCALE, DATE, FILE_UPLOAD, NET_PROMOTER_SCORE), text, is_required, options[] (for choice questions), validation_rules. ConditionalLogic: logic_id, question_id, condition (if question_id X has answer Y: skip to question Z). Response: response_id, survey_id, respondent_id (or NULL for anonymous), started_at, submitted_at, ip_address. Answer: answer_id, response_id, question_id, value (JSON — varies by question type).

Dynamic Question Schema

Questions have heterogeneous data: a multiple-choice question has options[], while a scale question has min/max/labels. Store question configuration as a JSON blob in a questions.config column: for MULTIPLE_CHOICE: {options: [“Option A”, “Option B”, “Option C”], allow_other: true}. For RATING: {max_stars: 5}. For SCALE: {min: 1, max: 10, min_label: “Not at all”, max_label: “Extremely”}. Answer values are also JSON: for MULTIPLE_CHOICE: {selected: [“Option A”]}. For SHORT_TEXT: {text: “user response here”}. For FILE_UPLOAD: {file_url: “s3://bucket/file.pdf”, file_name: “resume.pdf”}. This flexible schema accommodates all question types without a column per type. Index on (survey_id, question_id) for fast answer retrieval.

Conditional Logic

Conditional logic (skip logic, branching) shows or hides questions based on previous answers. Data model: ConditionalLogic(source_question_id, operator (EQUALS, CONTAINS, GREATER_THAN), value, action (SKIP_TO, SHOW, HIDE), target_question_id). Example: if Q3 (Do you have a car?) == “No”, skip to Q7 (skipping car-related questions). Client-side evaluation: the form renders the next question based on the current answers and the conditional logic rules. All logic rules for a survey are loaded upfront (typically a small set — under 50 rules per survey). Server-side validation: on submission, re-evaluate the conditional logic to determine which questions were shown to this respondent. Only validate and require answers for questions that were shown (others are silently skipped).

Response Collection and Validation

Response flow: (1) Respondent opens the survey link. (2) Survey and questions fetched from the API (or served as a static SPA). (3) Respondent fills in answers. Auto-save: for long surveys, auto-save partial responses every 30 seconds (save to the Response record with submitted_at=NULL). Allows resume on page reload. (4) On submission: validate all required questions are answered, validate answer formats (e.g., email questions check email format, date questions check date format). (5) Set submitted_at=NOW() on the Response record. Duplicate prevention: for surveys with allow_multiple_responses=false, check if the respondent (by user_id or IP) has already submitted. Return an error if so. Anonymous surveys: no respondent_id; use a cookie or localStorage token to detect duplicate submissions.

Analytics and Reporting

Question-level analytics computed from the answers table: for MULTIPLE_CHOICE: count each option and compute percentages. For RATING: average score and distribution. For SHORT_TEXT/LONG_TEXT: word frequency analysis, keyword extraction (NLP). For NPS: compute NPS score = % promoters (9-10) – % detractors (0-6). Display: response rate = submitted / invited (for email surveys). Completion rate = submitted / started. Average time to complete. Drop-off analysis: which question had the most abandonment? Aggregate queries on large surveys (100K responses): run on a read replica or pre-aggregate into a summary table updated hourly. Export: CSV export of all responses (each row = one response, columns = question answers). Real-time results: update analytics within 1 minute of each new response submission.

Survey sharing: public link (anyone with the link can respond), email list (send to specific emails, track individual completion), embed (iframe embed on a website), QR code. Access control for responses: survey owner can view all responses. Shared collaborators (via survey_collaborators table) can view but not edit. Response anonymity: anonymous surveys do not store respondent_id; the owner cannot identify respondents. Partial anonymity: store respondent_id internally (for deduplication) but display only aggregate data to the owner. Survey close conditions: closes_at datetime (auto-close), max_responses limit (close when N responses received). Webhook: on each new response, fire a webhook to the survey owner endpoint (Zapier, Slack notification, CRM integration).

. This avoids a wide table with nullable columns per type. On read, deserialize config into a typed object per question type. Rendering logic is type-specific; validation logic uses the config to determine rules. Adding new question types requires only new config shapes, not schema changes.”
}
},
{
“@type”: “Question”,
“name”: “How do you store survey responses efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Two design options. EAV (Entity-Attribute-Value): responses table with (response_id, question_id, value TEXT). Flexible but hard to query (pivot needed for analysis). JSON column: responses table with (response_id, form_id, submitted_at, answers JSONB). Answers: {“q1”: “Yes”, “q2”: [“A”,”B”], “q3″: 4}. JSONB allows indexing individual keys in PostgreSQL. Best approach for analytics: store raw JSONB for flexibility, plus a separate fact table for high-cardinality dimensions (form_id, submitted_at, user_id) for fast aggregation queries. For reports, materialize aggregates (option choice counts, average ratings) in a summary table updated by a background job after each response batch.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement conditional logic (show/hide questions based on answers)?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store conditional rules on each question: if question Q3 is shown only when Q1 == “Yes”, store on Q3: {“show_if”: {“question_id”: “Q1”, “operator”: “equals”, “value”: “Yes”}}. Support operators: equals, not_equals, contains, greater_than. Multiple conditions with AND/OR: {“show_if”: {“logic”: “AND”, “conditions”: […]}}. At render time, evaluate rules client-side in real time as the user answers. On submission, validate server-side: for each question with a show_if rule, evaluate whether it should have been visible given the submitted answers. If a question was hidden, its submitted answer should be null (discard any submitted value for hidden questions to prevent data poisoning). This keeps logic in the question metadata, not hardcoded.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent duplicate form submissions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On form load, generate a submission token (UUID) and embed it in a hidden field or store in session. On submission, check if this token was already used: SELECT from submissions WHERE token = X. If found, return the existing response (idempotent). If not, insert and mark the token as used atomically (unique constraint on token). This handles double-clicks and page resubmission. For authenticated users: enforce one response per user per form with a unique constraint on (form_id, user_id). For anonymous surveys: use a cookie to store the submission token; clear on success. If the requirement is strictly one response per IP: store (form_id, ip_hash) with a unique constraint, but IP-based deduplication is unreliable (NAT, shared networks). Token-based is more reliable.”
}
},
{
“@type”: “Question”,
“name”: “How do you design real-time survey analytics that update as responses come in?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On each response submission, publish an event to a message queue (Kafka/SQS). An analytics worker consumes events and updates in-memory aggregates: for each answered question, increment option counts (MULTIPLE_CHOICE), sum and count for averages (RATING), store text for word cloud (TEXT). Write aggregates to Redis hashes: HINCRBY survey:{form_id}:q:{q_id}:option:{opt} 1 for choice counts. The results dashboard polls Redis every 5 seconds (or uses WebSocket push if real-time is critical). For large forms (100k+ responses), run a batch aggregation job every 5 minutes to recompute from the raw responses table — more accurate than incremental updates that can drift. Display live response count, completion rate (partial vs full submissions), drop-off rate per question.”
}
}
]
}

Asked at: Atlassian Interview Guide

Asked at: Snap Interview Guide

Asked at: DoorDash Interview Guide

Asked at: Shopify Interview Guide