Question 1

How do you handle partial failures in a bulk operation?

Accepted Answer

Process every record independently and track success/failure per row. Never abort the entire job because one row fails — the other 999,999 records should still be processed. Use a per-row transaction: each record is its own DB transaction; a validation error on row 500 doesn't roll back rows 1-499. Store the error message per row in the result. Return a result CSV with columns: original data + status (success/error/skipped) + error message. Users can fix just the failed rows and re-import.

Question 2

How do you make bulk imports idempotent?

Accepted Answer

Require an external_id field in each row (a client-generated unique identifier for the record). Before creating, check if a record with that external_id already exists for this user. If yes: skip (return status=skipped) rather than creating a duplicate. This allows safe retries: if the import job fails halfway through, re-submitting the same file produces the same result — already-created records are skipped. Store the external_id on the created record for future deduplication.

Question 3

How do you handle a bulk import of 1 million records without timing out?

Accepted Answer

Never process synchronously in the HTTP request. Accept the file (upload to S3 directly via presigned URL), create a BulkJob record with status=QUEUED, and return the job_id immediately. A background worker picks up the job from a queue (SQS, Redis queue) and processes it asynchronously over minutes. The client polls GET /bulk-jobs/{job_id} for status and progress. When complete, the worker generates a result file on S3 and notifies the user via email/webhook.

Question 4

What is a good batch size for bulk database inserts?

Accepted Answer

1,000-10,000 rows per batch is typical. Smaller batches (100 rows) generate many round trips to the DB. Larger batches (100K rows) can cause lock contention, exceed DB query size limits, and make partial failure recovery harder. For PostgreSQL, 1,000-row INSERT batches with explicit transactions achieve near-peak INSERT throughput. For updates: use UPDATE...WHERE id IN (:ids) with batches of 1,000 IDs. Monitor DB CPU and lock wait metrics to tune the batch size for your specific workload.

Question 5

How do you notify users when a bulk job completes?

Accepted Answer

Send an email with the job summary (X succeeded, Y failed) and a link to download the result file. Also support webhooks: if the user has registered a webhook URL, POST the job completion event there. For in-product UX: show a notification in the UI (real-time via WebSocket or polling the job status endpoint). Generate a presigned S3 URL valid for 24 hours for the result CSV download — users can re-download without expiry issues during that window.

Bulk Operations System Low-Level Design

What is a Bulk Operations System?

Requirements

Data Model

Processing Pipeline

Idempotency via External ID

Transaction Strategy: Per-Row vs Batch

Key Design Decisions