Serverless computing abstracts away server management, letting you run code without provisioning or managing infrastructure. AWS Lambda, Google Cloud Functions, and Azure Functions automatically scale from zero to thousands of concurrent executions. This guide covers serverless architecture patterns, cold start optimization, limitations, and when serverless is (and is not) the right choice — essential for system design interviews and cloud architecture decisions.
How AWS Lambda Works
Lambda executes your code in response to events. You provide a function (handler) and configure a trigger (API Gateway HTTP request, S3 object upload, SQS message, DynamoDB stream, scheduled event). When triggered: (1) Lambda provisions a micro-VM (Firecracker) with your runtime (Node.js, Python, Java, Go). (2) Your function code is loaded and initialized. (3) The handler is invoked with the event payload. (4) The function runs for up to 15 minutes, then the execution environment is frozen (kept warm for future invocations). (5) If no invocation occurs for 5-15 minutes, the environment is destroyed. Pricing: pay per invocation ($0.20 per million) plus duration (billed per ms of compute time, proportional to allocated memory). Zero cost when idle — no invocations, no charge. This is the core value proposition: you do not pay for idle servers. Concurrency: Lambda scales automatically. Each concurrent request gets its own execution environment. Default limit: 1,000 concurrent executions per region (can be increased). Provisioned concurrency: pre-warm N environments to eliminate cold starts for latency-sensitive functions.
Cold Starts and Optimization
A cold start occurs when Lambda must create a new execution environment: provision the micro-VM, download and extract the deployment package, initialize the runtime, and run your initialization code. Cold start latency: 100-500ms for Python/Node.js, 500ms-2s for Java (JVM startup), 1-10s for large deployment packages or VPC-attached functions. Optimization strategies: (1) Keep deployment packages small — include only necessary dependencies. Use Lambda layers for shared libraries. (2) Minimize initialization code — move expensive initialization (database connections, SDK clients) outside the handler but inside the execution environment (they persist between warm invocations). (3) Choose lightweight runtimes — Python and Node.js cold-start faster than Java. For Java, use GraalVM native image (ahead-of-time compilation eliminates JVM startup). (4) Provisioned concurrency — pre-warm N environments. Costs more (you pay for idle provisioned environments) but eliminates cold starts for the configured number of concurrent executions. (5) Avoid VPC when possible — VPC-attached Lambdas required ENI creation, adding seconds to cold starts. AWS improved this significantly in 2019, but non-VPC Lambdas still start faster. (6) SnapStart (Java) — Lambda snapshots the initialized JVM and restores it on cold start, reducing Java cold starts to under 200ms.
Serverless Architecture Patterns
Common patterns: (1) API backend — API Gateway + Lambda + DynamoDB. API Gateway routes HTTP requests to Lambda functions. Lambda processes the request and reads/writes DynamoDB. Auto-scales from zero to thousands of concurrent requests. Ideal for: APIs with variable traffic, startups that do not want to manage infrastructure. (2) Event processing — S3 upload triggers Lambda, which processes the file (resize image, parse CSV, transcode video). SQS/SNS messages trigger Lambda for async task processing. DynamoDB Streams trigger Lambda for change data capture. (3) Scheduled tasks — CloudWatch Events triggers Lambda on a cron schedule. Replace cron jobs on EC2 instances. (4) Workflow orchestration — AWS Step Functions coordinates multiple Lambda functions into a workflow with conditional logic, parallel execution, error handling, and retries. Example: an order processing workflow that runs payment, inventory, and shipping Lambda functions in sequence with rollback on failure. (5) Streaming — Lambda processes Kinesis or Kafka records in micro-batches. Each shard is processed by one Lambda instance. Scales with the number of shards.
Serverless Limitations
Lambda is not suitable for every workload: (1) Execution time limit — 15 minutes maximum. Long-running processes (video transcoding, ML training, large data processing) must be broken into smaller chunks or run on containers/EC2. (2) Cold starts — unpredictable latency spikes. For latency-sensitive services (P99 under 100ms), cold starts are problematic. Provisioned concurrency helps but adds cost. (3) Statelessness — functions are stateless between invocations. State must be stored externally (DynamoDB, S3, ElastiCache). This adds latency for state-heavy operations. (4) Limited compute resources — up to 10 GB memory and 6 vCPUs. CPU-intensive workloads (video encoding, scientific computing) are better on dedicated instances. (5) Vendor lock-in — Lambda code depends on AWS-specific APIs, event formats, and deployment tooling. Migrating to another provider requires significant rewriting. (6) Debugging difficulty — distributed tracing across Lambda invocations is harder than debugging a monolithic application. CloudWatch logs are fragmented across execution environments. (7) Cost at scale — Lambda is cheap at low volume but expensive at sustained high throughput. At 100M invocations per month with 500ms average duration, Lambda costs significantly more than equivalent EC2 instances.
When to Use Serverless vs Containers vs VMs
Decision framework: (1) Variable, unpredictable traffic with periods of zero traffic — serverless. You pay nothing during idle periods. Scales automatically. Best for: startups, internal tools, webhook handlers, event processors, scheduled jobs. (2) Steady, high-throughput traffic — containers (ECS, EKS, Kubernetes). More cost-effective than Lambda at sustained load. Full control over runtime, networking, and resource allocation. Best for: core application services, APIs with predictable traffic. (3) Specialized hardware or OS requirements — VMs (EC2). GPU instances for ML, custom kernel configurations, legacy applications. (4) Quick prototyping and MVP — serverless. Deploy a working API in hours without infrastructure decisions. Migrate to containers later if needed. Hybrid approach: use Lambda for event-driven and low-traffic workloads (image processing, webhooks, scheduled jobs) and containers for steady-state services (API backends, databases). This combines the cost efficiency of serverless for bursty work with the predictability of containers for core services. In system design interviews: mention serverless as an option for specific components (image processing, notification sending) rather than the entire architecture. This demonstrates nuanced thinking about when serverless fits.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is a Lambda cold start and how do you minimize it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A cold start occurs when Lambda creates a new execution environment: provision a micro-VM, download the deployment package, initialize the runtime, and run initialization code. Cold start latency: 100-500ms for Python/Node.js, 500ms-2s for Java. Optimization: (1) Small deployment packages — include only necessary dependencies, use Lambda layers for shared libraries. (2) Lightweight runtimes — Python and Node.js cold-start faster than Java. For Java, use GraalVM native image or SnapStart (Lambda snapshots the JVM after init). (3) Move initialization outside the handler — database connections and SDK clients initialized at module level persist between warm invocations. (4) Provisioned concurrency — pre-warm N environments. Eliminates cold starts for the configured concurrency but costs more. (5) Avoid VPC attachment when possible — non-VPC Lambdas start faster (no ENI setup).”}},{“@type”:”Question”,”name”:”When should you use serverless versus containers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use serverless (Lambda) when: traffic is variable with periods of zero (pay nothing when idle), workloads are event-driven (file uploads, webhooks, queue processing), execution is short (under 15 minutes), you want zero infrastructure management, or you are building an MVP/prototype. Use containers (ECS/EKS) when: traffic is steady and high-throughput (Lambda is more expensive at sustained load), you need execution over 15 minutes, you need more than 10 GB memory or 6 vCPUs, latency must be consistently low (no cold starts), or you need specific runtime configurations. Hybrid approach: use Lambda for bursty event-driven work (image processing, notifications, scheduled jobs) and containers for steady-state services (API backends, databases). In interviews, suggest serverless for specific components rather than the entire system to show nuanced architectural thinking.”}},{“@type”:”Question”,”name”:”What are the main limitations of serverless architecture?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Key limitations: (1) 15-minute execution limit — long-running processes must be broken into chunks or run elsewhere. (2) Cold starts — unpredictable latency spikes (100ms-2s). Problematic for P99 latency targets. (3) Statelessness — no local state between invocations. All state must be in external services (DynamoDB, Redis), adding latency. (4) Limited compute — max 10 GB memory, 6 vCPUs. CPU-intensive workloads need dedicated instances. (5) Vendor lock-in — AWS-specific APIs and event formats. Migration requires significant rewriting. (6) Cost at scale — Lambda is cheap at low volume but expensive at sustained high throughput. At 100M invocations/month, EC2 is often cheaper. (7) Debugging difficulty — distributed tracing across invocations is harder than debugging a monolith. CloudWatch logs are fragmented.”}},{“@type”:”Question”,”name”:”What are common serverless architecture patterns?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Five patterns: (1) API backend — API Gateway + Lambda + DynamoDB. Routes HTTP requests to Lambda functions. Auto-scales from zero. Ideal for variable-traffic APIs. (2) Event processing — S3 upload triggers Lambda for image resizing, file parsing, or transcoding. SQS messages trigger Lambda for async task processing. (3) Scheduled tasks — CloudWatch Events triggers Lambda on a cron schedule. Replaces EC2 cron jobs with zero idle cost. (4) Workflow orchestration — AWS Step Functions coordinates multiple Lambda functions with conditional logic, parallel execution, error handling, and retries. Example: order processing with payment, inventory, and shipping steps. (5) Stream processing — Lambda processes Kinesis or Kafka records in micro-batches. Each shard gets one Lambda instance. Scales with shard count. These patterns can be combined: an e-commerce system might use pattern 1 for the API, pattern 2 for image uploads, pattern 3 for daily reports, and pattern 4 for order fulfillment.”}}]}