Git Pack Files and Delta Compression


The Problem: Git's Storage Challenge

Initially, Git snapshots your entire project on each commit. This would mean storing full copies of files for even minor changes, leading to significant storage inefficiency.


Enter Pack Files

To optimize storage, Git combines multiple objects (commits, trees, blobs) into compressed collections called pack files.

  • Initially, Git creates individual "loose objects".
  • Loose objects are stored in .git/objects.
  • Eventually, Git consolidates these into .pack files for efficiency.

Check loose objects:

find .git/objects -type f -not -path "*/pack/*" | head -n 1

When Does Git Pack Objects?

Git packs objects when:

  • Manually running git gc.
  • Pushing to a remote repository.
  • Loose object threshold is exceeded.
  • Triggered by certain Git operations.

Loose objects offer fast individual access, while pack files provide better storage and network efficiency.


Delta Compression Explained

Git uses delta compression in pack files to store only differences between similar objects:

  • Stores a base version and instructions (deltas) to recreate new versions.
  • Reduces redundancy, significantly optimizing storage.

Example delta instructions:

  • "Copy first 100 bytes from base"
  • "Add new 20 bytes"
  • "Skip 30 bytes from base"

Finding Similar Objects

Git finds similar objects by:

  1. Sorting objects by type (commit, tree, blob).
  2. Grouping blobs by pathname.
  3. Sorting within groups by size.

Git then selects the optimal base object that minimizes delta size.


Delta Chains

Git further optimizes using delta chains, where objects reference other delta-compressed objects:

  • Version 3 → delta → Version 2 → delta → Version 1 (stored fully)
  • Configured by pack.depth (default: ~50).

This balances compression and object retrieval performance.


Pack File Structure

A .pack file structure includes:

  1. Header (signature, version, object count)
  2. Packed objects:
    • Object headers (type, size, delta information)
    • Compressed contents or delta instructions
  3. Checksum

Each .pack file is accompanied by an .idx file for fast object lookups.


Pack Files in Action

Demonstration of pack file creation and delta compression:

# Create and initialize a demo repository
mkdir pack-demo && cd pack-demo
git init

# Initial commit
for i in {1..100}; do echo "Line $i of content" >> data.txt; done
git add data.txt
git commit -m "Initial version"

# Make incremental changes
sed -i '50s/.*/This line was changed/' data.txt
git commit -am "Change line 50"

sed -i '75s/.*/This line was also modified/' data.txt
git commit -am "Change line 75"

# Check loose objects before packing
echo "Loose objects before packing:"
find .git/objects -type f -not -path "*/pack/*" | wc -l

# Force packing with garbage collection
git gc

# Check objects after packing
echo "Loose objects after packing:"
find .git/objects -type f -not -path "*/pack/*" | wc -l

# View created pack files
ls -lh .git/objects/pack/

# Inspect pack contents
git verify-pack -v .git/objects/pack/pack-*.pack | grep data.txt

The resulting pack file demonstrates significant size reduction from delta compression.


Performance Implications

Git's storage method balances various performance aspects:

  • Fast Writes: Quickly creates loose objects.
  • Efficient Storage: Pack files and deltas save disk space.
  • Optimized Network Transfers: Reduced data size.
  • Complex Reads: Object reconstruction may involve following deltas.

Pack File Maintenance

Maintain optimal repository performance with these commands:

# Basic repacking
git repack

# Thorough repacking (removes redundant packs)
git repack -a -d

# Full garbage collection
git gc

Regular repacking ensures efficient storage.


Under-the-Hood Insights

Important insights about pack files and delta compression:

  • Git checks loose objects first, then pack files.
  • Configurable via pack.window and pack.windowMemory.
  • Delta chains can run forwards or backwards.
  • Base object selection emphasizes content similarity over historical order.
  • Pack files are immutable—Git always creates new ones instead of modifying existing ones.

Understanding Git's pack files and delta compression deepens your grasp of Git’s sophisticated storage design and its performance characteristics.