Low Level Design: Data Pipeline Design
A data pipeline moves data from source systems (production databases, event streams, APIs) through transformation stages to analytical destinations (data […]
A data pipeline moves data from source systems (production databases, event streams, APIs) through transformation stages to analytical destinations (data […]
A file system organizes data on storage media into files and directories, providing naming, access control, and efficient storage allocation.
Distributed tracing tracks a single request as it propagates through multiple microservices, capturing timing, errors, and context at each service
A search engine indexes documents and answers queries of the form “find documents containing these terms” in milliseconds, even across
MapReduce is a programming model for processing large datasets in parallel across a cluster of commodity machines. Introduced by Google
A connection pool maintains a set of pre-established connections to a resource (database, HTTP service, message broker) that can be
A distributed cache stores frequently accessed data in memory across a cluster of nodes, reducing latency and database load. Redis
Database indexes are data structures that allow the database engine to find rows matching a query condition without scanning the
Consensus algorithms allow a cluster of nodes to agree on a single value even when some nodes fail or messages
OAuth 2.0 is an authorization framework that allows applications to obtain limited access to user accounts on third-party services without
WebSockets provide full-duplex, persistent communication between a browser and server over a single TCP connection. Unlike HTTP request-response, either side
Sharding (horizontal partitioning) splits a large dataset across multiple database nodes to scale beyond what a single machine can handle.
An operating system scheduler decides which process or thread runs on each CPU core at any given moment. The scheduler
A memory allocator manages heap memory — fulfilling malloc/new requests and returning freed memory for reuse. The C standard library
Database replication copies data from a primary database to one or more replica databases to achieve high availability, read scalability,