Platform engineering builds an Internal Developer Platform (IDP) that provides self-service infrastructure capabilities to application teams. Instead of every team managing their own CI/CD pipelines, Kubernetes configurations, observability stacks, and secrets management, the platform team builds paved roads (golden paths) that embed best practices by default, reducing cognitive load on product engineers.
What Is an Internal Developer Platform
An IDP is the sum of tooling, workflows, and standards that product teams use to build and operate software. Core capabilities: CI/CD pipelines (build, test, deploy), Kubernetes namespaces and resource quotas (environment management), secrets injection (Vault integration), service catalog (what services exist and who owns them), observability (pre-configured dashboards, alerts), cost visibility (per-team cloud spend), and a developer portal (Backstage) that ties all capabilities together in a single UI.
Golden Paths
A golden path is an opinionated, pre-built template for common engineering tasks: “create a new service using the Go golden path” generates a repository with Dockerfile, Helm chart, GitHub Actions CI/CD pipeline, pre-configured Prometheus metrics, structured logging, health check endpoints, and a basic Grafana dashboard. The golden path embeds the platform team’s security, observability, and deployment best practices. Teams that follow the golden path get all best practices for free; teams that deviate from it lose platform support.
Self-Service and the Team Topologies Model
Team Topologies classifies the platform team as an “enabling team” and the product teams as “stream-aligned teams.” The platform team’s goal is to reduce cognitive load on stream-aligned teams by providing capabilities as a product. Platform features are shipped as self-service APIs: creating a new environment, requesting a database, configuring an alert. Product teams should not need to file tickets to the platform team for routine infrastructure operations.
Service Catalog and Backstage
A service catalog tracks all services in the organization: owner, tier (critical, standard, experimental), dependencies, SLOs, documentation, on-call rotation, deployment history, and alerts. Backstage (CNCF, originally Spotify) is the dominant open-source developer portal for building service catalogs. Each service registers via a catalog-info.yaml in its repository. The portal surface area: service ownership, API documentation, CI/CD pipelines, infrastructure cost, and feature flags per service.
Environment Management
The platform provides standardized environment tiers: development (personal dev environments, ephemeral per-branch), staging (shared pre-production, mirrors production configuration), production. Ephemeral environments: each pull request gets its own environment spun up from the Helm chart with preview URLs, lasting for the PR lifetime. This enables per-PR QA and design review without shared staging conflicts. Environment creation is self-service via the developer portal or GitHub Actions workflows.
Measuring Platform Success
Platform engineering success metrics (DORA metrics are foundational): Deployment Frequency (how often teams deploy to production), Lead Time for Changes (time from commit to production), Change Failure Rate (percentage of deployments causing incidents), and Time to Restore Service (MTTR for incidents). Internal NPS from product team surveys measures platform developer experience. Time-to-production for new services (using the golden path) vs manual setup measures the efficiency gain. Platform adoption rate: percentage of services using the golden path.
Infrastructure as Code
All platform infrastructure is defined in code (Terraform, Pulumi, Crossplane). Changes go through the same CI/CD review process as application code: pull request, automated plan preview (terraform plan), approval, apply. Infrastructure drift (changes made outside of IaC) is detected and flagged. The platform team uses the same GitOps workflows they provide to product teams — eating their own dog food. This ensures the platform is reproducible, auditable, and recoverable from disasters.