Understanding Reflogs and Garbage Collection

As projects grow and evolve, your Git repository silently accumulates a surprising amount of data behind the scenes. Understanding how Git manages this data through reflogs and garbage collection is essential for both troubleshooting and optimizing your workflow. This guide explores these often-overlooked aspects of Git's internal operations. XKCD comic about Git

Introduction: When Commit Messages Go Wild

Before we dive in, let's acknowledge a reality many developers face: as projects drag on, our commit discipline tends to deteriorate. As the comic illustrates, we might start with informative messages like "Created main loop & timing control" but eventually devolve into "AAAAAAA" and "MY HANDS ARE TYPING WORDS."

This decline in commit quality is precisely why understanding Git's internal mechanisms becomes so valuable. Even when our commit messages fail us, Git's underlying systems like reflogs can help us recover our work and understand our project's evolution.

Reflogs: Your Safety Net

While Git's commit history records the evolution of your project, reflogs record the evolution of your references (branches, HEAD, etc.). Think of them as "Git's Git" - they track how your repository references change over time.

What Reflogs Track

Reflogs record when references like branches or HEAD change, storing:

  • The previous and new value of the reference
  • Who made the change
  • When it happened
  • Why it happened (the command used)

Every Git reference (HEAD, branches, etc.) has its own reflog, stored in:

  • .git/logs/HEAD for the HEAD reflog
  • .git/logs/refs/heads/<branch-name> for branch reflogs

Why Reflogs Matter: Recovering "Lost" Work

Reflogs provide a safety net for many common mistakes:

  1. Accidental branch reset: If you force-push or reset a branch, the reflog remembers where it was before.
  2. Detached HEAD commits: Created a commit while in detached HEAD state and lost it? Reflog remembers.
  3. Rebase gone wrong: If a rebase causes issues, reflogs can help you find your pre-rebase state.

As illustrated in our comic, when your commit discipline breaks down into "AAAAAAA" and "HAAAAAAAANDS," reflogs can help you piece together what actually happened.

Working with Reflogs

To view the reflog for HEAD:

git reflog

This produces output like:

734713b HEAD@{0}: commit: Fix bug in login form
82f5d1c HEAD@{1}: commit: Add user authentication
4ad3ff9 HEAD@{2}: pull: Fast-forward
...

Each entry shows:

  • The commit hash
  • The reference with a time index (e.g., HEAD@{0} is the current state)
  • The action that caused the reference change
  • A description (usually the commit message)

To view a specific reference's reflog:

git reflog show main   # Show reflog for the main branch

To recover "lost" commits:

# First, find the commit in the reflog
git reflog

# Then, create a branch at that commit or checkout directly
git branch recovered-work HEAD@{5}
# or
git checkout -b recovered-work HEAD@{5}

Reflog Expiration

Reflogs don't last forever:

  • By default, reflog entries expire after 90 days. F
  • Entries for unreachable commits expire after 30 days
  • These settings can be configured in your Git config

Garbage Collection: Keeping Git Tidy

As you work with Git, you create objects that eventually become unreferenced:

  • Commits that were reset or amended
  • Blobs from files that were modified
  • Objects from branches that were deleted

Git's garbage collection (GC) process cleans up these unreferenced objects to save space.

When Garbage Collection Happens

Git runs garbage collection:

  • Automatically during certain operations when thresholds are met
  • When you manually run git gc
  • When you push to a remote (on the server side)

The Garbage Collection Process

When Git runs garbage collection:

  1. Loose objects are packed into packfiles for efficiency
  2. Unreferenced objects that are older than the grace period are removed
  3. Redundant packfiles are consolidated
  4. Reflog entries past their expiration are pruned

You can trigger garbage collection manually:

git gc                  # Normal collection
git gc --aggressive     # More thorough (but slower) collection
git gc --prune=now      # Remove all unreferenced objects immediately

The Safety Period

Git doesn't immediately delete unreferenced objects. By default:

  • Objects remain in the repository for at least 2 weeks
  • This grace period ensures reflogs can still reference "lost" objects
  • After this period, truly unreferenced objects are candidates for removal

This safety period is why you can often recover "lost" work even after operations like reset or rebase.

Hands-on Exploration: Peering Into Git's Internals

Now let's explore these concepts with practical commands. Like the comic shows, sometimes the best way to understand Git is to get your hands dirty!

Examining Git Objects

To inspect any Git object:

git cat-file -p <object-hash>  # Print the object's contents
git cat-file -t <object-hash>  # Show the object's type

For example, to see your latest commit:

git cat-file -p HEAD

To look at the tree referenced by that commit:

git cat-file -p HEAD^{tree}

Exploring the .git Directory

The .git directory contains all of Git's internal data:

ls -la .git/          # List everything in the .git directory
ls -la .git/objects/  # See loose objects and packfiles
ls -la .git/refs/     # Explore branch references
ls -la .git/logs/     # Find the reflogs

To see if objects are packed or loose:

git count-objects -v

This command shows statistics about your repository's objects:

  • count: Number of loose objects
  • size: Size of loose objects (in KB)
  • in-pack: Number of objects in packfiles
  • size-pack: Size of packfiles (in KB)

Reflog Archaeology

To dig into your repository's history through reflogs:

# View all operations affecting HEAD in the last 2 weeks
git reflog

# Look at a specific branch's reflog
git reflog show main

# Get more details about reflog entries
git log -g --date=relative

# Find all commits, even those not referenced by branches
git fsck --unreachable

Just like in our comic where commit messages deteriorate over time ("MORE CODE" → "HERE HAVE CODE" → "AAAAAAA"), your reflog might reveal actions you forgot taking when you were in the depths of coding.

Simulating Object Cleanup

To understand what Git's garbage collection would remove:

# Show objects that would be pruned, without removing them
git prune --dry-run

# Show what git gc would do
git gc --dry-run

Practical Scenarios: When These Tools Save Your Day

Scenario 1: Recovering from Bad Rebase

Imagine you've rebased a branch and realized it was a mistake:

# Find the pre-rebase state in the reflog
git reflog
# Example output: abcd123 HEAD@{5}: checkout: moving from main to feature

# Restore the branch to its pre-rebase state
git reset --hard abcd123

Scenario 2: Finding a "Lost" Commit

Like our comic's "MY HANDS ARE TYPING WORDS" commit that you might want to revisit:

# Search reflog for commits with specific content
git reflog | grep -i "typing words"

# Or, look at all commits not reachable from any branch
git fsck --no-reflogs --unreachable | grep commit

Scenario 3: Repository Optimization

For large repositories with long histories (perhaps full of nonsensical commits like in our comic):

# Comprehensive cleanup and optimization
git gc --aggressive --prune=now

# For even more thorough optimization
git repack -a -d --depth=250 --window=250

Best Practices: Maintaining Repository Health

  1. Regular Maintenance: Run git gc periodically on large repositories
  2. Meaningful Commits: Unlike the comic, try to maintain informative commit messages
  3. Backup Reflogs: Before major operations, consider backing up .git/logs/
  4. Custom Expiration: Configure reflog and GC expiration times based on your project needs:
# Keep reflogs for 1 year
git config --global gc.reflogExpire "1 year"

# Never expire unreachable objects (careful, uses more disk space)
git config --global gc.pruneExpire "never"

Key Git Commands for Reflogs & Garbage Collection

Viewing & Recovering Lost Work

bash
CopyEdit
git reflog                         # View reflog history
git log -g                         # Show detailed reflog history
git checkout -b recovery HEAD@{5}  # Restore a lost commit
git fsck --unreachable             # Find unreachable commits
git rev-list --all --objects | grep "<filename>"  # Find lost files

Cleaning & Optimizing Git Storage

bash
CopyEdit
git gc                             # Standard garbage collection
git gc --aggressive                # Thorough cleanup (⚠️ Slow for large repos)
git gc --prune=now                 # Remove all unreferenced objects immediately
git prune --dry-run                # Preview objects to be removed
git count-objects -v               # Show repo size & object counts

Configuring Git Maintenance

bash
CopyEdit
git config --global gc.auto 500           # Run GC after 500 loose objects
git config --global gc.reflogExpire "90 days"  # Set reflog expiration
git config --global gc.pruneExpire "14 days"   # Set unreferenced object expiration