Question 1

What is Infrastructure as Code and why is it important?

Accepted Answer

Infrastructure as Code (IaC) manages cloud infrastructure through declarative configuration files instead of manual console operations. Benefits: (1) Version control -- infrastructure changes are tracked in git with full history, authorship, and the ability to revert. (2) Reproducibility -- create identical environments from the same code. No configuration drift between staging and production. (3) Code review -- infrastructure changes go through pull requests with peer review before applying. (4) Automation -- CI/CD pipelines validate and apply changes automatically. (5) Documentation -- the code IS the documentation of what infrastructure exists. Without IaC, teams accumulate snowflake servers that are manually configured, undocumented, and impossible to reproduce. When a critical server fails at 3 AM, rebuilding from memory is unreliable and slow. IaC ensures any environment can be recreated from code in minutes.

Question 2

What is the difference between Terraform and Pulumi?

Accepted Answer

Terraform uses HCL (HashiCorp Configuration Language), a domain-specific declarative language. Pulumi uses general-purpose programming languages (TypeScript, Python, Go, Java). Terraform advantages: largest ecosystem (more providers, modules, community examples), HCL prevents over-engineering by limiting expressiveness, wider industry adoption (easier hiring). Pulumi advantages: use a language your team already knows (no learning a new DSL), full programming constructs (loops, conditionals, functions, classes for reusable infrastructure components), unit testing with standard frameworks (Jest, pytest), and IDE support (autocomplete, type checking). Both produce a desired-state plan and apply changes incrementally. Both support multi-cloud. Choose Terraform for teams that value simplicity and ecosystem breadth. Choose Pulumi for teams that want to leverage existing programming language expertise and need complex infrastructure abstractions. CloudFormation is AWS-only: tightly integrated with AWS, supports every AWS resource on launch day, but verbose and limited to JSON/YAML.

Question 3

What is GitOps and how does ArgoCD implement it?

Accepted Answer

GitOps makes git the single source of truth for infrastructure and application configuration. An agent continuously reconciles the actual state of the system with the desired state declared in git. ArgoCD is the standard GitOps controller for Kubernetes. It watches a git repository containing Kubernetes manifests (plain YAML, Helm charts, or Kustomize). When a change is merged to the main branch, ArgoCD detects the difference between the git state and the cluster state, and applies the changes to bring the cluster in sync. Key benefits: (1) Pull-based deployment -- ArgoCD pulls from git rather than CI pushing to the cluster. Only ArgoCD needs cluster credentials, not the CI system. (2) Drift detection -- if someone manually changes a Kubernetes resource (kubectl edit), ArgoCD detects the drift and can auto-revert to match git. (3) Audit trail -- every change is a git commit with author and timestamp. (4) Rollback -- revert a git commit and ArgoCD applies the previous state. Workflow: developer commits change, PR review, merge to main, ArgoCD syncs to cluster, health check verifies success.

Question 4

How do you handle Terraform state management safely?

Accepted Answer

Terraform state (terraform.tfstate) maps your configuration to real infrastructure IDs. It is the most critical file in your Terraform workflow. Safe state management: (1) Remote backend -- store state in S3 (with versioning enabled) + DynamoDB for state locking. Never store state in git (it may contain secrets like database passwords). Terraform Cloud and Terraform Enterprise also provide managed state storage. (2) State locking -- DynamoDB (or equivalent) prevents two engineers from running terraform apply simultaneously. Without locking, concurrent applies can corrupt state or create duplicate resources. (3) State encryption -- enable server-side encryption on the S3 bucket. State files contain resource attributes that may include sensitive values. (4) Workspaces or separate state files per environment -- staging and production should have separate state files. A mistake in staging should never affect the production state. (5) State backup -- S3 versioning provides automatic backups. If state is corrupted, restore a previous version. (6) Import existing resources -- if infrastructure was created manually, use terraform import to bring it under Terraform management without recreating it.

System Design: Infrastructure as Code — Terraform, Pulumi, CloudFormation, GitOps, Drift Detection, State Management

Why Infrastructure as Code

Terraform: The Multi-Cloud Standard

Pulumi vs Terraform

GitOps for Infrastructure

Drift Detection and Remediation