Survey and Forms System Low-Level Design: Dynamic Form Builder, Response Storage, and Analytics

Form Schema

A form is defined as a document with ordered questions, each with a type and optional configuration:

forms {
  form_id     UUID PK
  owner_id    UUID FK
  title       VARCHAR(255)
  description TEXT
  questions   JSONB   -- ordered array of question objects
  settings    JSONB
  status      ENUM(DRAFT, ACTIVE, CLOSED)
  created_at  TIMESTAMP
}

Question types: text (short/long), single_choice, multi_choice, rating (1-N scale), likert (agree/disagree matrix), matrix, file_upload, date. Each question object:

{
  "question_id": "q1",
  "type": "single_choice",
  "text": "How did you hear about us?",
  "required": true,
  "options": ["Search", "Social media", "Friend"],
  "conditional_logic": {"show_if": {"question_id": "q0", "operator": "eq", "value": "Yes"}},
  "validation_rules": {"min_length": 10}
}

Settings include: one_response_per_user, anonymous, close_after_date, max_responses.

Conditional Logic

Questions can be shown or hidden based on previous answers. The condition evaluates client-side for instant feedback — no round trip needed. Supported operators: eq, neq, contains, gt, lt. Example: show question 5 only if question 3 equals “Yes”. The form renderer evaluates all conditions after each answer change and updates question visibility accordingly. Skipped (hidden) questions are excluded from the submitted response payload.

Response Schema

Each form submission produces a response record:

responses {
  response_id          UUID PK
  form_id              UUID FK
  respondent_id        UUID FK nullable   -- null if anonymous
  anonymous_token      VARCHAR(64) nullable
  answers              JSONB   -- {"q1": "Search", "q3": ["A","B"]}
  started_at           TIMESTAMP
  submitted_at         TIMESTAMP nullable
  completion_time_secs INT nullable
  partial              BOOLEAN DEFAULT true
}

Answers are stored as a flat JSON map from question_id to value. File upload answers store the S3 object key.

Partial Save and Resume

Progress is saved automatically as the respondent fills out the form. Every 30 seconds (or on tab blur), the client sends the current answers to:

PUT /responses/{response_id}/partial

The partial flag remains true until final submission. A resumption token (the response_id) is stored in the browser's localStorage and a session cookie. Returning to the form URL loads the partial response and restores the respondent's position in the form.

Duplicate Prevention

For authenticated forms (one_response_per_user = true), a unique constraint on (form_id, respondent_id) prevents double submission. For anonymous forms, duplicate prevention uses a combination of browser fingerprint (User-Agent + screen resolution hash) and IP address stored in anonymous_token. This is a soft signal — not cryptographically enforced — but deters casual duplicates. The form owner can configure strictness.

Result Aggregation

Aggregations are computed per question type:

Single / multi choice: frequency distribution — count of each option selected; percentage of total responses
Rating: mean, median, standard deviation, histogram of values
Text: word frequency (tokenize, stop-word filter, count); optional sentiment analysis via NLP service
Likert / matrix: mean score per row/statement

Aggregates are computed incrementally: on each new submission, update running totals stored in a form_aggregates cache table. This avoids full table scans for large response sets. The cache is invalidated and rebuilt if responses are deleted.

Real-Time Response Count

A Redis counter tracks total submissions per form:

INCR form:responses:{form_id}

The live counter is displayed in the form owner's dashboard without querying the database. On reaching max_responses, the form status is atomically set to CLOSED and subsequent submissions are rejected with a 410 Gone response.

Export

CSV export produces one row per response, one column per question. Column headers use question text (truncated). Multi-choice answers are serialized as semicolon-delimited values. Export supports filters: date range, completion status (submitted only vs. including partials). For cross-tabulation reports, a pivot export groups responses by one question and shows distribution of another — useful for “segment by role, show rating distribution.”

File Upload Questions

When a file upload question is encountered, the client requests a pre-signed S3 URL from the server:

POST /forms/{form_id}/upload-url
→ { "upload_url": "https://s3.../...", "key": "forms/{form_id}/responses/{response_id}/files/q5/{filename}" }

The client uploads directly to S3 using the pre-signed URL. The S3 key is stored in the response's answers JSON. File size limits and allowed MIME types are validated server-side when the pre-signed URL is generated.

Quota Management

Before accepting a new submission, the system checks whether max_responses has been reached and whether close_after_date has passed. Both checks use the Redis counter and a simple timestamp comparison, avoiding database reads on the hot submission path. If either condition is met, the submission is rejected with a human-readable error message shown to the respondent.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design a flexible data model for a dynamic survey form builder that supports multiple question types?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use three core tables: surveys (id, owner_id, title, status, created_at), questions (id, survey_id, position, type [text, single_choice, multi_choice, rating, matrix, file_upload], label, is_required, config_json), and question_options (id, question_id, position, label, value). config_json captures type-specific settings (e.g., min/max for rating, row/column definitions for matrix). Store conditional logic (show question B only if question A answer equals X) in a question_conditions table referencing source and target question IDs with a condition expression. This avoids encoding branching logic into the question row itself, keeping the model extensible.”
}
},
{
“@type”: “Question”,
“name”: “How do you store survey responses efficiently when questions and their types vary per survey?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use a survey_responses table (id, survey_id, respondent_id or session_token, submitted_at, metadata_json) and a response_answers table (id, response_id, question_id, answer_text nullable, answer_option_ids integer[] nullable, answer_json nullable). Store free-text answers in answer_text, single/multi-choice selections as an array of option IDs in answer_option_ids, and complex answers (matrix, ranking) as structured JSON in answer_json. Index on (survey_id, question_id) for analytics queries. This EAV-like approach trades some query complexity for schema flexibility. For high-volume surveys, partition response_answers by survey_id or submission date.”
}
},
{
“@type”: “Question”,
“name”: “How would you prevent duplicate submissions and handle partial saves for long surveys?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Issue each respondent a session token (UUID) when they open the survey. Use this token as an idempotency key: INSERT INTO survey_responses … ON CONFLICT (survey_id, session_token) DO NOTHING. For partial saves, allow respondents to save progress without submitting by storing a draft_response record keyed to the session token with status=DRAFT. On final submit, transition status to SUBMITTED atomically. Validate all required questions are answered before accepting SUBMITTED status. On the client, auto-save draft answers every 30 seconds via a PATCH endpoint so progress survives page refreshes. Expire draft responses after a configurable TTL (e.g., 7 days).”
}
},
{
“@type”: “Question”,
“name”: “How do you aggregate survey response analytics efficiently for surveys with millions of submissions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Pre-aggregate counts incrementally rather than scanning all response_answers on every dashboard load. Maintain a question_stats table (survey_id, question_id, option_id nullable, count, sum nullable, updated_at) updated by a background worker that processes new submissions in micro-batches. For choice questions, increment count per option_id. For rating/numeric questions, maintain count and sum to compute mean on read; store a histogram bucket array for percentile computation. For open-text questions, run async NLP jobs (sentiment, keyword extraction) and store results in a separate text_analytics table. Serve dashboard queries from the pre-aggregated stats table, falling back to direct query only for ad-hoc filters or small surveys where pre-aggregation is not yet worthwhile.”
}
}
]
}