Question 1

What is the difference between thin provisioning and thick provisioning?

Accepted Answer

Thick provisioning allocates all physical storage for a volume immediately at creation time. A 1 TB thick-provisioned volume consumes 1 TB of physical storage even if only 10 GB of data has been written. This guarantees performance (no allocation latency on first write) and avoids capacity surprises, but wastes storage for sparsely used volumes. Thin provisioning allocates physical blocks only as data is written for the first time. A 1 TB thin-provisioned volume might initially consume only a few megabytes of physical storage. The trade-off is that a first write to an unallocated block incurs an allocation overhead, and capacity planning is more complex — if many thin-provisioned volumes grow unexpectedly, the physical pool can run out of space (over-provisioning risk).

Question 2

How does copy-on-write snapshot work and what is its performance overhead?

Accepted Answer

Copy-on-write (CoW) snapshots preserve the original data when a write occurs on a snapshotted volume. When a block is about to be overwritten, the storage system first reads the original block, copies it to the snapshot store, updates the snapshot's block map to point to the saved copy, and then writes new data to the original location. This means the first write to any snapshotted block incurs a read-then-write overhead (the 'write amplification' of CoW). Subsequent writes to already-snapshotted blocks do not incur copy cost because the original has already been saved. Snapshot chains multiply this effect: if a block has been overwritten N times, restoring the oldest snapshot requires reading N CoW copies in order. This is why deep snapshot chains degrade both write performance and restore time.

Question 3

How does iSCSI work and how does it compare to NVMe-oF?

Accepted Answer

iSCSI (Internet Small Computer System Interface) encapsulates SCSI block storage commands in TCP/IP packets. The compute node (initiator) connects to the storage node (target) over a standard TCP connection, discovers LUNs (logical unit numbers), and sends SCSI read/write commands as if talking to a local disk. iSCSI works over standard Ethernet hardware, making it cost-effective but limited by TCP overhead and latency. NVMe-oF (NVMe over Fabrics) exposes NVMe command queues over a high-speed fabric (RDMA over RoCE, Fibre Channel, or TCP). NVMe-oF achieves near-local NVMe latency (sub-100 microsecond) because RDMA bypasses the OS kernel network stack. The trade-off is that NVMe-oF requires specialized fabric hardware (RDMA NICs), while iSCSI runs on commodity Ethernet.

Question 4

When should you use synchronous versus asynchronous replication for block storage?

Accepted Answer

Synchronous replication waits for the write to be acknowledged by all replicas before returning success to the caller. This guarantees zero RPO (recovery point objective — no data loss) but adds latency equal to the round-trip time to the farthest replica. Synchronous replication is appropriate for databases and financial workloads where data loss is unacceptable and replicas are in the same data center or metro area (low RTT). Asynchronous replication acknowledges the write after the primary stores it and then replicates in the background. This does not block writes and has lower latency, but creates an RPO gap — data written after the last replicated point is lost if the primary fails. Async is appropriate for disaster recovery replicas in geographically distant regions where synchronous latency would be prohibitive.

Question 5

How does thin provisioning differ from thick provisioning?

Accepted Answer

Thick provisioning pre-allocates all physical blocks at volume creation time, guaranteeing space but wasting it if the volume is not fully used; thin provisioning allocates physical blocks only on first write, overcommitting physical storage against logical capacity.

Question 6

How does copy-on-write snapshot work?

Accepted Answer

When a snapshot is taken, subsequent writes to the original volume first copy the original block to the snapshot store before overwriting; reads from the snapshot retrieve the saved copy; only modified blocks incur the copy overhead.

Question 7

What is iSCSI and how does it differ from NVMe-oF?

Accepted Answer

iSCSI encapsulates SCSI commands in TCP/IP packets, using standard network infrastructure; NVMe-oF (NVMe over Fabrics) uses RDMA or TCP with the NVMe protocol, offering significantly lower latency and higher IOPS suitable for latency-sensitive workloads.

Question 8

How does synchronous replication affect write latency?

Accepted Answer

Synchronous replication requires acknowledgment from both primary and replica before confirming the write to the client; this doubles the write path latency by the replica's round-trip time; asynchronous replication confirms immediately after the primary write, with replica lag accepted.

Block Storage System Low-Level Design: Volume Management, iSCSI Protocol, Snapshots, and Thin Provisioning

Volume Abstraction

Thin Provisioning

Copy-on-Write Snapshots

iSCSI Protocol

Replication

RAID Levels

SQL DDL

Python: Core Operations

Design Considerations Summary