Back

How Git Works Under the Hood: Objects, Trees, and Blobs

We use git add, git commit, and git push every day. But have you ever looked inside the .git folder?

Understanding Git's internals demystifies the "magic" and makes you a much more confident user (and better at fixing merge conflicts!).

At its core, Git is a content-addressable filesystem. Ideally, it's a key-value store where:

  • Key: SHA-1 hash of the content.
  • Value: The content itself.

Let's explore the three main objects Git uses to store your history.

1. The Blob (Binary Large Object)

When you git add a file, Git takes the content of that file and stores it as a Blob.

  • It doesn't store the filename, only the content.
  • If two files have the exact same content, they share the same Blob (deduplication!).

You can see this with git cat-file:

$ echo "hello" | git hash-object -w --stdin ce013625030ba8dba906f756967f9e9ca394464a

That hash is the key.

2. The Tree

So where are the filenames stored? In a Tree object.

A Tree is like a directory. It contains a list of:

  • File permissions (e.g., 100644)
  • Object type (blob or tree)
  • SHA-1 hash
  • Filename

Think of a Tree as a snapshot of a folder structure.

100644 blob ce0136...    hello.txt
100644 blob 9a8b7c...    README.md
040000 tree 3b1c2d...    src

3. The Commit

Finally, we have the Commit object.

A commit is simply a wrapper that points to a specific Tree (the snapshot of the project at that moment) and adds metadata:

  • Author & Committer
  • Date
  • Commit Message
  • Parent Commit hash

Because each commit points to its parent, they form a linked list (or rather, a Directed Acyclic Graph - DAG).

tree 3b1c2d...
parent 8a9b0c...
author John Doe <[email protected]>
committer John Doe <[email protected]>

Initial commit

Visualizing the Relationship

graph TD subgraph Commit C[Commit Object] -->|Points to| T[Tree Object] C -->|Points to| P[Parent Commit] end subgraph Tree T -->|Contains| B1[Blob: hello.txt] T -->|Contains| B2[Blob: README.md] T -->|Contains| T2[Tree: src] end subgraph Blob B1 -->|Content| Content1["hello"] B2 -->|Content| Content2["# Readme"] end

Putting It All Together

When you run git commit:

  1. Git creates Blobs for changed files.
  2. Git creates a Tree representing the project structure.
  3. Git creates a Commit object pointing to that Tree and the previous commit.
  4. The branch pointer (e.g., main) moves to this new commit hash.

Why This Matters

  • Git is immutable: Once an object is created, it never changes. "Modifying" history actually creates new objects.
  • Cheap branching: A branch is just a tiny 40-byte file containing a commit hash. Creating a branch costs nothing.
  • Integrity: The SHA-1 hash ensures that if a single bit of data changes, the ID changes. It's impossible to corrupt history without detection.

Next time you're stuck in "detached HEAD" state, remember: you're just pointing to a Commit object that isn't labeled with a branch name!

TechGitVCSInternals

Explore Related Tools

Try these free developer tools from Pockit