How Git Works Under the Hood: Objects, Trees, and Blobs
We use git add, git commit, and git push every day. But have you ever looked inside the .git folder?
Understanding Git's internals demystifies the "magic" and makes you a much more confident user (and better at fixing merge conflicts!).
At its core, Git is a content-addressable filesystem. Ideally, it's a key-value store where:
- Key: SHA-1 hash of the content.
- Value: The content itself.
Let's explore the three main objects Git uses to store your history.
1. The Blob (Binary Large Object)
When you git add a file, Git takes the content of that file and stores it as a Blob.
- It doesn't store the filename, only the content.
- If two files have the exact same content, they share the same Blob (deduplication!).
You can see this with git cat-file:
$ echo "hello" | git hash-object -w --stdin ce013625030ba8dba906f756967f9e9ca394464a
That hash is the key.
2. The Tree
So where are the filenames stored? In a Tree object.
A Tree is like a directory. It contains a list of:
- File permissions (e.g.,
100644) - Object type (
blobortree) - SHA-1 hash
- Filename
Think of a Tree as a snapshot of a folder structure.
100644 blob ce0136... hello.txt
100644 blob 9a8b7c... README.md
040000 tree 3b1c2d... src
3. The Commit
Finally, we have the Commit object.
A commit is simply a wrapper that points to a specific Tree (the snapshot of the project at that moment) and adds metadata:
- Author & Committer
- Date
- Commit Message
- Parent Commit hash
Because each commit points to its parent, they form a linked list (or rather, a Directed Acyclic Graph - DAG).
tree 3b1c2d...
parent 8a9b0c...
author John Doe <[email protected]>
committer John Doe <[email protected]>
Initial commit
Visualizing the Relationship
graph TD subgraph Commit C[Commit Object] -->|Points to| T[Tree Object] C -->|Points to| P[Parent Commit] end subgraph Tree T -->|Contains| B1[Blob: hello.txt] T -->|Contains| B2[Blob: README.md] T -->|Contains| T2[Tree: src] end subgraph Blob B1 -->|Content| Content1["hello"] B2 -->|Content| Content2["# Readme"] end
Putting It All Together
When you run git commit:
- Git creates Blobs for changed files.
- Git creates a Tree representing the project structure.
- Git creates a Commit object pointing to that Tree and the previous commit.
- The branch pointer (e.g.,
main) moves to this new commit hash.
Why This Matters
- Git is immutable: Once an object is created, it never changes. "Modifying" history actually creates new objects.
- Cheap branching: A branch is just a tiny 40-byte file containing a commit hash. Creating a branch costs nothing.
- Integrity: The SHA-1 hash ensures that if a single bit of data changes, the ID changes. It's impossible to corrupt history without detection.
Next time you're stuck in "detached HEAD" state, remember: you're just pointing to a Commit object that isn't labeled with a branch name!
Explore Related Tools
Try these free developer tools from Pockit