Package Registry Low-Level Design: Artifact Storage, Version Resolution, and Dependency Graph

Package Registry: What It Solves

A package registry is a versioned artifact store for software libraries. Developers publish packages; consumers download specific versions. The system must provide strong immutability guarantees (a published version never changes), fast global distribution, accurate dependency resolution, and supply chain security. npm, PyPI, Maven Central, and crates.io are real-world examples with billions of downloads per day.

Package Schema

packages (
  name             TEXT,
  version          TEXT,          -- semver: 1.2.3
  description      TEXT,
  license          TEXT,
  maintainers      JSONB,         -- [{name, email}]
  dependencies     JSONB,         -- {"lodash": "^4.17.21"}
  dev_dependencies JSONB,
  checksum_sha256  CHAR(64),      -- of the artifact tarball
  download_url     TEXT,
  published_at     TIMESTAMP,
  published_by     BIGINT,        -- user_id
  deprecated       BOOLEAN DEFAULT FALSE,
  PRIMARY KEY (name, version)
)

Version + name is the primary key — the same package name can have many versions but each (name, version) pair is unique and immutable once published.

Artifact Storage

Tarballs (.tgz, .whl, .jar) are stored in S3 with a deterministic object key:

s3://registry-artifacts/{name}/{version}/{name}-{version}.tgz

S3 object versioning is disabled for package artifacts — objects are write-once. Once written, the key is never overwritten. Bucket policy denies DELETE and PUT on existing keys. This enforces immutability at the storage layer, not just at the API layer. Download URLs are signed S3 presigned URLs with short expiry (15 minutes), or served through a CDN.

Checksum Verification

On upload: compute SHA-256 of the artifact and store in the packages table. On download: the client verifies the downloaded artifact's SHA-256 matches the registry's recorded checksum before installation. This detects both accidental corruption and intentional tampering in transit or at the CDN layer.

Semantic Versioning and Range Resolution

Semantic versions follow MAJOR.MINOR.PATCH. Range specifiers in dependency declarations:

  • ^1.2.3 — compatible with 1.x.x where x >= 2.3: accepts 1.2.3 through 1.99.99, not 2.0.0
  • ~1.2.3 — approximately equivalent: accepts 1.2.3 through 1.2.99, not 1.3.0
  • >=1.0.0 <2.0.0 — explicit range
  • 1.2.3 — exact version pinning

Resolution: given a range, query all published versions for that package, filter to those satisfying the range, return the highest satisfying version. The lock file records the exact resolved version so subsequent installs are deterministic regardless of new publishes.

Dependency Resolution Algorithm

Dependency resolution is a constraint satisfaction problem. Naive recursive expansion is correct for trees but real dependency graphs have diamond dependencies (A depends on B and C; both B and C depend on D but different version ranges).

resolve(package, version):
  if already_resolved(package): return resolved_version(package)
  candidates = versions_satisfying(package, version_range)
  for candidate in candidates (highest first):
    deps = manifest(package, candidate).dependencies
    if all deps can be resolved without conflict:
      mark_resolved(package, candidate)
      for each dep: resolve(dep, dep_version_range)
      return candidate
  raise ConflictError

npm uses a flat node_modules layout with version hoisting (highest compatible version shared at top level; conflicting versions nested). Cargo (Rust) uses a SAT solver for precise conflict detection. The lock file output maps every (package, range) to an exact version — subsequent installs skip resolution and use the lock file directly.

Package Signing and Supply Chain Security

Package signing lets consumers verify the artifact was published by a trusted party and has not been modified:

  • GPG signing (traditional): Maintainer signs the artifact with their private key; consumers verify with the maintainer's public key from a keyserver.
  • Sigstore (modern): Keyless signing using ephemeral keys tied to a developer's OIDC identity (GitHub Actions, Google account). Signatures are recorded in a public transparency log (Rekor). Consumers verify by checking the log — no key management required.

Proxying Upstream Registries

Private registries proxy public registries (npm, PyPI, Maven Central) to provide: local caching (faster downloads, insulation from upstream outages), security scanning before artifacts reach developers, and a single endpoint for both private and public packages.

On cache miss: fetch from upstream, verify checksum, store locally, return to client. Cached artifacts are served indefinitely (immutable by version). Metadata (available versions list) is refreshed on TTL or on-demand.

Scoped Packages and Namespacing

Scoped packages (@org/package) provide namespacing. All packages under @org/ are owned by that organization. Unscoped names are global. Scopes prevent name squatting on common names and allow organizations to publish private packages alongside public ones.

Immutability and Unpublish Policy

The left-pad incident (a package author unpublished a widely-used package, breaking thousands of builds) established the industry norm: packages cannot be unpublished after a grace period (24 hours in npm). After the grace period, a version can only be deprecated (marked with a warning) but not deleted. The artifact remains available forever. This trades the ability to remove malicious packages for build reproducibility — malicious packages are handled by advisories and forced updates, not deletion.

Search Index

An Elasticsearch index on package metadata (name, description, keywords, maintainers) provides full-text search with ranking by download count, last publish date, and keyword relevance. Autocomplete on package name uses a prefix-optimized edge-n-gram analyzer.

Trade-offs and Failure Modes

  • Dependency confusion attacks: An attacker publishes a public package with the same name as an internal private package. Package managers may prefer the public version. Mitigate by reserving internal package names in the public registry or using scoped packages exclusively for internal code.
  • CDN cache poisoning: If a CDN node serves a different artifact for the same package version URL, checksum verification at the client will catch it. Never disable checksum verification.
  • Resolution performance: Deep dependency graphs with many version conflicts make resolution slow (NP-complete in the worst case). Use memoization and early conflict detection. Most practical cases resolve in milliseconds.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does semantic version range resolution work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A resolver collects all declared version ranges (e.g., ^1.2.0, ~2.3.1) across the dependency graph and intersects them to find the highest version that satisfies every constraint, applying SemVer precedence rules (major.minor.patch). When ranges are disjoint and no single version satisfies all constraints, the resolver raises a dependency conflict and halts, requiring the user to manually pin or override the conflicting constraint.”
}
},
{
“@type”: “Question”,
“name”: “How is package immutability enforced to prevent supply chain attacks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Once a package version is published, its tarball is content-addressed by a SHA-256 digest stored in the registry metadata, and subsequent publishes of the same version are rejected with a 409 Conflict response. Clients verify the downloaded tarball's digest against the registry-recorded hash before extraction, ensuring a compromised CDN or storage layer cannot substitute a malicious artifact.”
}
},
{
“@type”: “Question”,
“name”: “How does upstream proxy caching work in a private registry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A private registry configured as a proxy intercepts package resolution requests, checks its local cache by package name and version, and on a cache miss fetches the tarball and metadata from the upstream public registry (e.g., npmjs.org), storing them locally before serving. Subsequent requests for the same version are served entirely from the private cache, providing air-gap resilience and reducing dependency on upstream availability.”
}
},
{
“@type”: “Question”,
“name”: “How is a dependency lock file generated?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After resolving the full dependency tree, the package manager serializes the exact resolved version, download URL, and integrity hash (SHA-512 in npm's case) for every direct and transitive dependency into a lock file (package-lock.json, yarn.lock, etc.). On subsequent installs, the resolver reads the lock file and installs pinned versions directly, bypassing range resolution to guarantee reproducible builds across environments and CI systems.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top