Object Types
A version control system stores three object types in a content-addressed object store:
- Blob: raw file content. The blob has no filename — it is purely the bytes of the file.
- Tree: a directory listing. Each entry maps a filename to a blob SHA (for files) or a tree SHA (for subdirectories), plus a file mode.
- Commit: a snapshot of the root tree plus metadata —
tree_sha,parent_commit_sha[](empty for initial commit, two for merge commits),author,committer,message,timestamp.
This three-object model allows efficient representation: if two commits share 95% of their files unchanged, those files' blobs and their containing trees are shared — only the changed objects are new.
Content Addressing
Each object's SHA-1 is computed from its content. The object is stored at .git/objects/{sha[:2]}/{sha[2:]}. Deduplication is automatic: committing the same file twice produces the same SHA and stores only one blob. Content addressing also provides integrity checking — if an object's content does not match its SHA, it has been corrupted.
Branch Model
A branch is simply a mutable text file containing a commit SHA. .git/refs/heads/main contains the SHA of the latest commit on main. HEAD is a pointer to the current branch (or directly to a commit SHA in detached HEAD state). This makes branch creation O(1) — just write a new file with a commit SHA.
Two merge strategies:
- Fast-forward: if the target branch is a direct ancestor of the source branch, simply advance the branch pointer to the source tip. No merge commit created.
- Three-way merge: common ancestor is found; a merge commit with two parents is created.
Three-Way Merge Algorithm
Given branches B and C to merge, find their common ancestor A (lowest common ancestor in the commit DAG). Compute diffs A→B and A→C. Apply both diffs to A:
- If only one branch changed a region: take that branch's change
- If both branches changed the same region identically: take the change once
- If both branches changed the same region differently: conflict — mark the region with conflict markers and require manual resolution
The recursive merge strategy handles criss-cross merges (where there is more than one possible common ancestor) by first merging the multiple ancestors into a virtual base. The ort strategy is an optimized reimplementation of recursive that avoids checking out the virtual base.
Pack Files and Delta Compression
As a repository accumulates history, loose objects are consolidated into pack files. Pack file construction:
- Select objects to pack (all loose objects, or objects reachable from certain refs)
- For similar objects (e.g., two versions of the same file), compute a binary delta: store only the delta rather than the full content of both
- Compress the entire pack with zlib
- Write a pack index file for O(log n) object lookup by SHA
Delta compression can reduce repository size by 10x or more for repositories with large binary files that change incrementally.
Garbage Collection
Objects that are not reachable from any ref (branch, tag, or the reflog) are unreachable. GC identifies these objects and deletes them after a grace period (default 2 weeks, to protect objects referenced by in-flight operations). GC also packs loose objects and consolidates multiple pack files. It runs automatically on a heuristic (after a certain number of loose objects accumulate) or explicitly.
Shallow Clone and Grafts
git clone --depth N fetches only the last N commits per branch. The server sends commits without their full ancestry. The client stores a shallow file listing commit SHAs whose parents were not fetched. These commits are treated as root commits locally. Unshallowing (deepening) fetches additional history from the server on demand.
Reflog
The reflog records every movement of HEAD and branch pointers with a timestamp and reason. It is a local safety net: if a branch pointer is accidentally reset or deleted, the previous commit SHA is still in the reflog for the grace period. git reflog shows this history; git reset --hard HEAD@{3} restores to a previous state. Reflogs are not pushed to remotes — they are local only.
Hooks
Git hooks are scripts in .git/hooks/ that run at specific lifecycle events. Common hooks: pre-commit (run linters, tests before commit — non-zero exit aborts the commit), commit-msg (validate commit message format), pre-push (run tests before push — non-zero exit aborts the push), post-receive (server-side: trigger CI, update issue trackers after a push is accepted). Hooks are not version-controlled by default — teams distribute them via a setup script or a hook manager like Husky.
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering