Package Registry: What It Solves
A package registry is a versioned artifact store for software libraries. Developers publish packages; consumers download specific versions. The system must provide strong immutability guarantees (a published version never changes), fast global distribution, accurate dependency resolution, and supply chain security. npm, PyPI, Maven Central, and crates.io are real-world examples with billions of downloads per day.
Package Schema
packages (
name TEXT,
version TEXT, -- semver: 1.2.3
description TEXT,
license TEXT,
maintainers JSONB, -- [{name, email}]
dependencies JSONB, -- {"lodash": "^4.17.21"}
dev_dependencies JSONB,
checksum_sha256 CHAR(64), -- of the artifact tarball
download_url TEXT,
published_at TIMESTAMP,
published_by BIGINT, -- user_id
deprecated BOOLEAN DEFAULT FALSE,
PRIMARY KEY (name, version)
)
Version + name is the primary key — the same package name can have many versions but each (name, version) pair is unique and immutable once published.
Artifact Storage
Tarballs (.tgz, .whl, .jar) are stored in S3 with a deterministic object key:
s3://registry-artifacts/{name}/{version}/{name}-{version}.tgz
S3 object versioning is disabled for package artifacts — objects are write-once. Once written, the key is never overwritten. Bucket policy denies DELETE and PUT on existing keys. This enforces immutability at the storage layer, not just at the API layer. Download URLs are signed S3 presigned URLs with short expiry (15 minutes), or served through a CDN.
Checksum Verification
On upload: compute SHA-256 of the artifact and store in the packages table. On download: the client verifies the downloaded artifact's SHA-256 matches the registry's recorded checksum before installation. This detects both accidental corruption and intentional tampering in transit or at the CDN layer.
Semantic Versioning and Range Resolution
Semantic versions follow MAJOR.MINOR.PATCH. Range specifiers in dependency declarations:
^1.2.3— compatible with 1.x.x where x >= 2.3: accepts 1.2.3 through 1.99.99, not 2.0.0~1.2.3— approximately equivalent: accepts 1.2.3 through 1.2.99, not 1.3.0>=1.0.0 <2.0.0— explicit range1.2.3— exact version pinning
Resolution: given a range, query all published versions for that package, filter to those satisfying the range, return the highest satisfying version. The lock file records the exact resolved version so subsequent installs are deterministic regardless of new publishes.
Dependency Resolution Algorithm
Dependency resolution is a constraint satisfaction problem. Naive recursive expansion is correct for trees but real dependency graphs have diamond dependencies (A depends on B and C; both B and C depend on D but different version ranges).
resolve(package, version):
if already_resolved(package): return resolved_version(package)
candidates = versions_satisfying(package, version_range)
for candidate in candidates (highest first):
deps = manifest(package, candidate).dependencies
if all deps can be resolved without conflict:
mark_resolved(package, candidate)
for each dep: resolve(dep, dep_version_range)
return candidate
raise ConflictError
npm uses a flat node_modules layout with version hoisting (highest compatible version shared at top level; conflicting versions nested). Cargo (Rust) uses a SAT solver for precise conflict detection. The lock file output maps every (package, range) to an exact version — subsequent installs skip resolution and use the lock file directly.
Package Signing and Supply Chain Security
Package signing lets consumers verify the artifact was published by a trusted party and has not been modified:
- GPG signing (traditional): Maintainer signs the artifact with their private key; consumers verify with the maintainer's public key from a keyserver.
- Sigstore (modern): Keyless signing using ephemeral keys tied to a developer's OIDC identity (GitHub Actions, Google account). Signatures are recorded in a public transparency log (Rekor). Consumers verify by checking the log — no key management required.
Proxying Upstream Registries
Private registries proxy public registries (npm, PyPI, Maven Central) to provide: local caching (faster downloads, insulation from upstream outages), security scanning before artifacts reach developers, and a single endpoint for both private and public packages.
On cache miss: fetch from upstream, verify checksum, store locally, return to client. Cached artifacts are served indefinitely (immutable by version). Metadata (available versions list) is refreshed on TTL or on-demand.
Scoped Packages and Namespacing
Scoped packages (@org/package) provide namespacing. All packages under @org/ are owned by that organization. Unscoped names are global. Scopes prevent name squatting on common names and allow organizations to publish private packages alongside public ones.
Immutability and Unpublish Policy
The left-pad incident (a package author unpublished a widely-used package, breaking thousands of builds) established the industry norm: packages cannot be unpublished after a grace period (24 hours in npm). After the grace period, a version can only be deprecated (marked with a warning) but not deleted. The artifact remains available forever. This trades the ability to remove malicious packages for build reproducibility — malicious packages are handled by advisories and forced updates, not deletion.
Search Index
An Elasticsearch index on package metadata (name, description, keywords, maintainers) provides full-text search with ranking by download count, last publish date, and keyword relevance. Autocomplete on package name uses a prefix-optimized edge-n-gram analyzer.
Trade-offs and Failure Modes
- Dependency confusion attacks: An attacker publishes a public package with the same name as an internal private package. Package managers may prefer the public version. Mitigate by reserving internal package names in the public registry or using scoped packages exclusively for internal code.
- CDN cache poisoning: If a CDN node serves a different artifact for the same package version URL, checksum verification at the client will catch it. Never disable checksum verification.
- Resolution performance: Deep dependency graphs with many version conflicts make resolution slow (NP-complete in the worst case). Use memoization and early conflict detection. Most practical cases resolve in milliseconds.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does semantic version range resolution work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A resolver collects all declared version ranges (e.g., ^1.2.0, ~2.3.1) across the dependency graph and intersects them to find the highest version that satisfies every constraint, applying SemVer precedence rules (major.minor.patch). When ranges are disjoint and no single version satisfies all constraints, the resolver raises a dependency conflict and halts, requiring the user to manually pin or override the conflicting constraint.”
}
},
{
“@type”: “Question”,
“name”: “How is package immutability enforced to prevent supply chain attacks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Once a package version is published, its tarball is content-addressed by a SHA-256 digest stored in the registry metadata, and subsequent publishes of the same version are rejected with a 409 Conflict response. Clients verify the downloaded tarball's digest against the registry-recorded hash before extraction, ensuring a compromised CDN or storage layer cannot substitute a malicious artifact.”
}
},
{
“@type”: “Question”,
“name”: “How does upstream proxy caching work in a private registry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A private registry configured as a proxy intercepts package resolution requests, checks its local cache by package name and version, and on a cache miss fetches the tarball and metadata from the upstream public registry (e.g., npmjs.org), storing them locally before serving. Subsequent requests for the same version are served entirely from the private cache, providing air-gap resilience and reducing dependency on upstream availability.”
}
},
{
“@type”: “Question”,
“name”: “How is a dependency lock file generated?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After resolving the full dependency tree, the package manager serializes the exact resolved version, download URL, and integrity hash (SHA-512 in npm's case) for every direct and transitive dependency into a lock file (package-lock.json, yarn.lock, etc.). On subsequent installs, the resolver reads the lock file and installs pinned versions directly, bypassing range resolution to guarantee reproducible builds across environments and CI systems.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide