Read-heavy systems serve many more reads than writes — often 100:1 or higher ratios. Optimizing for reads requires layered caching, read replicas, denormalization, and CQRS (Command Query Responsibility Segregation) patterns. The goal is to serve reads from memory or local cache while keeping writes consistent, without over-engineering for write throughput that isn’t needed.
Layered Caching
Build a cache hierarchy: L1 in-process memory cache (submicrosecond, bounded by process memory), L2 shared Redis cache (sub-millisecond, shared across service instances), L3 CDN edge cache (for public content, serves from edge PoP near the user). Cache hits at L1 avoid network calls entirely. L2 hits avoid database queries. L3 hits avoid origin server load. A 95% overall cache hit rate with a 5% miss rate hitting the database reduces database load by 20x. Profile which objects are most frequently requested to prioritize what to cache.
Read Replicas
Distribute read queries across multiple database replicas. The primary handles writes; read replicas replicate from the primary asynchronously. Route read-only queries (SELECT) to replicas; write queries (INSERT, UPDATE, DELETE) to the primary. Replica lag is typically milliseconds — acceptable for most read use cases. Not acceptable for: reading immediately after a write (read-your-own-write consistency), financial balances, inventory counts. For these, read from the primary or use synchronous replication.
Denormalization
Normalized databases join multiple tables to assemble a response. Joins are expensive at scale. Denormalization pre-computes and materializes joined data: a user_profile_view table stores a user's data plus their account tier, preference settings, and follower count in one row. Reads hit one table instead of joining five. Writes must update both the normalized source and the denormalized view — an eventual consistency tradeoff. For read-heavy systems where latency matters more than write complexity, denormalization is often the right choice.
CQRS Pattern
CQRS (Command Query Responsibility Segregation) uses separate data models for reads and writes. Write model: normalized, ACID, optimized for consistency. Read model: denormalized, eventually consistent, optimized for query patterns. Write a new order → update the orders table (write model) and publish an event → the read model consumer updates the order_summary table (optimized for order listing queries). This allows the read model to use a different database technology (Elasticsearch for search, Redis for leaderboards, Cassandra for time-series) that matches the query pattern.
Query Result Caching
Cache the results of expensive database queries. Key the cache entry on the query parameters: cache_key = “top_posts:” + user_id + “:” + category. Set TTL based on acceptable staleness: 60 seconds for a news feed, 300 seconds for user profile, 3600 seconds for product catalog. On cache miss, execute the query and populate the cache. Use cache-aside pattern (application manages cache) rather than write-through (database writes trigger cache update) to avoid coupling the write path to the cache.
Materialized Views
A materialized view is a pre-computed query result stored as a physical table. Unlike a regular view (a named query that executes on access), a materialized view stores data and requires periodic refresh. Use for expensive aggregation queries that run frequently: total order count per user, daily revenue by product category, top 100 most-viewed articles. Refresh strategies: full refresh (recompute the entire result), incremental refresh (apply only changes since last refresh). Incremental refresh is faster but more complex to implement.
Connection Pool Tuning for Read-Heavy Load
Read-heavy workloads issue many short queries concurrently. Connection pool size = expected concurrency / query_duration. For 1000 concurrent reads at 5ms average duration: pool size = 1000 * 0.005 = 5 — a small pool is sufficient if queries are fast. Over-provisioning the pool wastes memory and database connection slots. Monitor pool wait time (time requests spend waiting for an available connection) and pool utilization. Increase pool size only when wait time is high, not preemptively.