System Design Interview: Serverless Architecture and Function-as-a-Service

What Is Serverless?

Serverless (Function-as-a-Service / FaaS) lets you deploy individual functions without managing servers. AWS Lambda, Google Cloud Functions, and Azure Functions execute your code in response to events and automatically scale from zero to thousands of concurrent executions. You pay only for the milliseconds your code runs — no idle server costs. The tradeoff: cold starts, strict resource limits, and limited execution duration.

How Lambda Execution Works Internally

When a Lambda function is invoked for the first time (or when scaling to handle more concurrent requests), AWS creates a new execution environment: a lightweight Firecracker microVM with your runtime (Node.js, Python, Java, etc.) pre-installed. Your initialization code runs (imports, database connections, static data loading) — this is the cold start phase, taking 100ms-3s depending on runtime and code size. Once initialized, the execution environment is kept warm for 5-15 minutes. Subsequent invocations reuse the same environment, skipping initialization — warm invocations take only your function logic time.

Cold start latency by runtime: Python/Node.js: 100-300ms. Java/C# (JVM/.NET): 500ms-3s. Mitigation strategies: (1) Provisioned Concurrency: pre-initialize N execution environments so they are always warm — eliminates cold starts at additional cost. (2) Lambda Snapstart (Java): take a checkpoint of the initialized JVM state; restore from snapshot on cold start instead of re-initializing (100ms vs 3s). (3) Keep functions small: smaller deployment packages initialize faster. (4) Warm-up pings: invoke functions on a schedule to keep them warm — hack that adds cost and complexity.

Event-Driven Serverless Patterns

API Gateway + Lambda (Synchronous)

The most common pattern. API Gateway routes HTTP requests to Lambda functions. Each request gets its own Lambda invocation (or reuses a warm environment). Scales automatically to millions of requests/day. Timeout: API Gateway has a 29-second maximum. Lambda max timeout: 15 minutes. For synchronous API patterns, keep functions under 3 seconds to avoid user-facing latency.

Event-Driven Processing (Asynchronous)

Trigger Lambda from S3 events (file upload → process image), SNS notifications, SQS queues, DynamoDB streams, or Kinesis streams. The event source triggers Lambda automatically; Lambda processes events and can write to databases, call APIs, or trigger downstream events. SQS → Lambda: SQS sends batches of up to 10,000 messages to Lambda. Lambda processes the batch and deletes successfully processed messages. Failed messages go to a Dead Letter Queue (DLQ) after max retry attempts. Lambda scales automatically based on SQS queue depth.

Fan-Out with SNS

One event triggers multiple Lambda functions in parallel. A payment event is published to an SNS topic; three Lambda functions subscribe: one sends a receipt email, one updates fraud detection signals, one archives to a data warehouse. SNS delivers to all subscribers simultaneously.

Limitations and When Not to Use Serverless

Constraint AWS Lambda limit Impact
Execution timeout 15 minutes max Cannot run long batch jobs
Memory 10GB max (also CPU allocation) Cannot run memory-intensive ML inference
Ephemeral storage 512MB-10GB /tmp No persistent local state between invocations
Concurrency 1000 default, increasable Account-wide limit, requires quota increase
Cold start 100ms-3s Latency spike for infrequent traffic
Database connections Each invocation is stateless Connection pool exhaustion without RDS Proxy

Do not use serverless for: long-running batch jobs (use Fargate or Spark), stateful applications requiring persistent memory (use ECS/EKS), CPU-intensive workloads running continuously (EC2 is cheaper), or latency-sensitive paths where sub-10ms cold starts are required (use pre-warmed containers).

Good serverless use cases: event-driven webhooks, image/video processing triggered by uploads, scheduled jobs (nightly reports, cleanup), unpredictable/bursty traffic patterns, backend APIs with low to moderate traffic.

Database Connections: The Scaling Problem

Lambda scales to thousands of concurrent executions. If each function opens a database connection, you exhaust the database connection limit (PostgreSQL default: 100, max practical: ~5,000). Solution: RDS Proxy — a connection pool that sits between Lambda and RDS. Lambda connects to the proxy (lightweight TLS connection); the proxy maintains a pool of actual database connections. The proxy multiplexes thousands of Lambda connections onto a smaller connection pool. Reduces database connection overhead and enables Lambda to work with relational databases at scale.

Observability in Serverless

Distributed tracing is harder in serverless because each function invocation is ephemeral. Key approaches:

  • AWS X-Ray: Lambda automatically sends trace segments to X-Ray when X-Ray tracing is enabled. The SDK instruments downstream calls (HTTP, DynamoDB, SQS). X-Ray assembles trace graphs across multiple Lambda invocations.
  • Structured logging to CloudWatch: log JSON with correlation IDs (request_id, user_id, trace_id). CloudWatch Logs Insights queries across function invocations.
  • Custom metrics: push business metrics (orders processed, payment failures) to CloudWatch custom metrics or a metrics aggregator (Datadog, Grafana).
  • Cold start tracking: log initialization time separately from handler time. Alert if cold start rate exceeds threshold (indicates traffic pattern change or function size regression).

Lambda Architecture (Function Design)


# Python Lambda handler pattern
import json
import boto3
import logging

# Module-level initialization: runs once per cold start
logger = logging.getLogger()
logger.setLevel(logging.INFO)
db_client = boto3.client('dynamodb')  # reused across warm invocations

def handler(event, context):
    # context.aws_request_id is the unique invocation ID
    logger.info(json.dumps({
        'request_id': context.aws_request_id,
        'event_type': event.get('type'),
    }))

    try:
        result = process(event)
        return {'statusCode': 200, 'body': json.dumps(result)}
    except ValueError as e:
        return {'statusCode': 400, 'body': str(e)}
    except Exception as e:
        logger.error(f'Unhandled error: {e}', exc_info=True)
        raise  # Re-raise for Lambda retry / DLQ routing

Cost Comparison: Lambda vs ECS vs EC2

Lambda: $0.20 per million invocations + $0.0000166667 per GB-second. A 512MB function running 100ms costs $0.000000834 per invocation — extremely cheap for low-to-moderate traffic. At 100M invocations/month: ~$185. ECS Fargate: 0.25 vCPU + 512MB task running 24/7 costs ~$12/month — but handles unlimited concurrent requests. EC2 t3.micro: ~$8/month — cheapest for sustained load. Break-even: Lambda becomes more expensive than a single Fargate task at roughly 10-15M invocations/month (depending on duration). For predictable high-traffic APIs, containers are more cost-effective. For spiky/unpredictable traffic, Lambda avoids paying for idle capacity.

Key Interview Points

  • Cold starts: 100ms-3s; mitigate with Provisioned Concurrency or SnapStart for Java
  • Use RDS Proxy to prevent connection pool exhaustion from Lambda scaling
  • Async patterns (SQS/SNS triggers) are more reliable than sync Lambda APIs for long processing
  • Lambda is cost-effective for bursty traffic; containers are better for sustained high-throughput
  • Trace with X-Ray + structured JSON logging + correlation IDs across invocations
  • 15-minute timeout hard limit — not suitable for long batch jobs

Frequently Asked Questions

What causes Lambda cold starts and how do you minimize them?

A Lambda cold start occurs when AWS must create a new execution environment before running your function — either because no warm environment exists (first invocation after idle period) or because demand exceeds existing warm environments (traffic spike). The cold start phases: (1) Provisioning the Firecracker microVM (AWS-controlled, ~10ms), (2) Downloading and initializing the runtime (Python, Node.js, Java — 50-300ms), (3) Running your initialization code (imports, module loading, connection setup — varies by code). Java and .NET cold starts are longest (500ms-3s) due to JVM/CLR startup. Minimization strategies: Provisioned Concurrency (AWS pre-warms N environments — eliminates cold starts for predictable traffic, adds cost); Lambda SnapStart for Java (checkpoints post-initialization JVM state, restores from snapshot in ~100ms instead of full JVM init); keep packages small (no unused dependencies — use Lambda Layers to share common code); move heavy initialization to module level (outside the handler) so it runs once per warm environment; avoid putting large-latency initialization (DNS resolution, connection establishment) in the critical path if it can be deferred.

How does serverless handle database connections at scale?

The core problem: Lambda scales to thousands of concurrent executions. If each Lambda execution opens its own database connection, you rapidly exhaust PostgreSQL or MySQL connection limits (typically 100-5000). A Lambda function that holds a connection open during execution prevents connection reuse. AWS RDS Proxy solves this by acting as a connection multiplexer. Lambda connects to the Proxy (lightweight authentication, fast connection), and the Proxy maintains a pool of long-lived database connections. The Proxy uses connection pinning (the same database connection is used for all queries in a transaction) and connection multiplexing (multiple Lambda connections share one database connection when between transactions). This allows 10,000+ Lambda executions to share 100-500 database connections. RDS Proxy adds 1-2ms overhead but enables serverless applications to use relational databases without exhausting connections. For DynamoDB (NoSQL), this problem does not exist — DynamoDB is an HTTP-based API with no persistent connections, scaling seamlessly with Lambda concurrency.

When should you choose serverless over containers or VMs?

Choose serverless (Lambda, Cloud Functions) when: traffic is highly variable or unpredictable — you pay only for execution time, so bursty workloads do not incur idle costs; the application is event-driven (process uploads, handle webhooks, run cron jobs) and each invocation is independent; cold starts are acceptable (background processing, batch jobs, or traffic is warm enough to avoid them); execution duration is under 15 minutes per invocation. Choose containers (ECS, Kubernetes) or VMs when: the workload runs continuously (sustained high-traffic API — idle container cost is lower than per-invocation Lambda at >10M invocations/month); cold start latency is unacceptable (sub-10ms warm path required for latency-sensitive APIs); you need more than 10GB memory, custom kernel modules, or persistent local disk; the application maintains in-process state between requests (WebSocket servers, stateful ML inference with loaded model in RAM); you need fine-grained control over the compute environment. The break-even: Lambda becomes more expensive than a single always-on Fargate container at approximately 10-20 million 100ms invocations per month.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What causes Lambda cold starts and how do you minimize them?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A Lambda cold start occurs when AWS must create a new execution environment before running your function — either because no warm environment exists (first invocation after idle period) or because demand exceeds existing warm environments (traffic spike). The cold start phases: (1) Provisioning the Firecracker microVM (AWS-controlled, ~10ms), (2) Downloading and initializing the runtime (Python, Node.js, Java — 50-300ms), (3) Running your initialization code (imports, module loading, connection setup — varies by code). Java and .NET cold starts are longest (500ms-3s) due to JVM/CLR startup. Minimization strategies: Provisioned Concurrency (AWS pre-warms N environments — eliminates cold starts for predictable traffic, adds cost); Lambda SnapStart for Java (checkpoints post-initialization JVM state, restores from snapshot in ~100ms instead of full JVM init); keep packages small (no unused dependencies — use Lambda Layers to share common code); move heavy initialization to module level (outside the handler) so it runs once per warm environment; avoid putting large-latency initialization (DNS resolution, connection establishment) in the critical path if it can be deferred.”
}
},
{
“@type”: “Question”,
“name”: “How does serverless handle database connections at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The core problem: Lambda scales to thousands of concurrent executions. If each Lambda execution opens its own database connection, you rapidly exhaust PostgreSQL or MySQL connection limits (typically 100-5000). A Lambda function that holds a connection open during execution prevents connection reuse. AWS RDS Proxy solves this by acting as a connection multiplexer. Lambda connects to the Proxy (lightweight authentication, fast connection), and the Proxy maintains a pool of long-lived database connections. The Proxy uses connection pinning (the same database connection is used for all queries in a transaction) and connection multiplexing (multiple Lambda connections share one database connection when between transactions). This allows 10,000+ Lambda executions to share 100-500 database connections. RDS Proxy adds 1-2ms overhead but enables serverless applications to use relational databases without exhausting connections. For DynamoDB (NoSQL), this problem does not exist — DynamoDB is an HTTP-based API with no persistent connections, scaling seamlessly with Lambda concurrency.”
}
},
{
“@type”: “Question”,
“name”: “When should you choose serverless over containers or VMs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Choose serverless (Lambda, Cloud Functions) when: traffic is highly variable or unpredictable — you pay only for execution time, so bursty workloads do not incur idle costs; the application is event-driven (process uploads, handle webhooks, run cron jobs) and each invocation is independent; cold starts are acceptable (background processing, batch jobs, or traffic is warm enough to avoid them); execution duration is under 15 minutes per invocation. Choose containers (ECS, Kubernetes) or VMs when: the workload runs continuously (sustained high-traffic API — idle container cost is lower than per-invocation Lambda at >10M invocations/month); cold start latency is unacceptable (sub-10ms warm path required for latency-sensitive APIs); you need more than 10GB memory, custom kernel modules, or persistent local disk; the application maintains in-process state between requests (WebSocket servers, stateful ML inference with loaded model in RAM); you need fine-grained control over the compute environment. The break-even: Lambda becomes more expensive than a single always-on Fargate container at approximately 10-20 million 100ms invocations per month.”
}
}
]
}

Companies That Ask This Question

Scroll to Top