Git Pack Files and Delta Compression
The Problem: Git's Storage Challenge
Initially, Git snapshots your entire project on each commit. This would mean storing full copies of files for even minor changes, leading to significant storage inefficiency.
Enter Pack Files
To optimize storage, Git combines multiple objects (commits, trees, blobs) into compressed collections called pack files.
- Initially, Git creates individual "loose objects".
- Loose objects are stored in
.git/objects. - Eventually, Git consolidates these into
.packfiles for efficiency.
Check loose objects:
find .git/objects -type f -not -path "*/pack/*" | head -n 1
When Does Git Pack Objects?
Git packs objects when:
- Manually running
git gc. - Pushing to a remote repository.
- Loose object threshold is exceeded.
- Triggered by certain Git operations.
Loose objects offer fast individual access, while pack files provide better storage and network efficiency.
Delta Compression Explained
Git uses delta compression in pack files to store only differences between similar objects:
- Stores a base version and instructions (deltas) to recreate new versions.
- Reduces redundancy, significantly optimizing storage.
Example delta instructions:
- "Copy first 100 bytes from base"
- "Add new 20 bytes"
- "Skip 30 bytes from base"
Finding Similar Objects
Git finds similar objects by:
- Sorting objects by type (commit, tree, blob).
- Grouping blobs by pathname.
- Sorting within groups by size.
Git then selects the optimal base object that minimizes delta size.
Delta Chains
Git further optimizes using delta chains, where objects reference other delta-compressed objects:
- Version 3 → delta → Version 2 → delta → Version 1 (stored fully)
- Configured by
pack.depth(default: ~50).
This balances compression and object retrieval performance.
Pack File Structure
A .pack file structure includes:
- Header (signature, version, object count)
- Packed objects:
- Object headers (type, size, delta information)
- Compressed contents or delta instructions
- Checksum
Each .pack file is accompanied by an .idx file for fast object lookups.
Pack Files in Action
Demonstration of pack file creation and delta compression:
# Create and initialize a demo repository
mkdir pack-demo && cd pack-demo
git init
# Initial commit
for i in {1..100}; do echo "Line $i of content" >> data.txt; done
git add data.txt
git commit -m "Initial version"
# Make incremental changes
sed -i '50s/.*/This line was changed/' data.txt
git commit -am "Change line 50"
sed -i '75s/.*/This line was also modified/' data.txt
git commit -am "Change line 75"
# Check loose objects before packing
echo "Loose objects before packing:"
find .git/objects -type f -not -path "*/pack/*" | wc -l
# Force packing with garbage collection
git gc
# Check objects after packing
echo "Loose objects after packing:"
find .git/objects -type f -not -path "*/pack/*" | wc -l
# View created pack files
ls -lh .git/objects/pack/
# Inspect pack contents
git verify-pack -v .git/objects/pack/pack-*.pack | grep data.txt
The resulting pack file demonstrates significant size reduction from delta compression.
Performance Implications
Git's storage method balances various performance aspects:
- Fast Writes: Quickly creates loose objects.
- Efficient Storage: Pack files and deltas save disk space.
- Optimized Network Transfers: Reduced data size.
- Complex Reads: Object reconstruction may involve following deltas.
Pack File Maintenance
Maintain optimal repository performance with these commands:
# Basic repacking
git repack
# Thorough repacking (removes redundant packs)
git repack -a -d
# Full garbage collection
git gc
Regular repacking ensures efficient storage.
Under-the-Hood Insights
Important insights about pack files and delta compression:
- Git checks loose objects first, then pack files.
- Configurable via
pack.windowandpack.windowMemory. - Delta chains can run forwards or backwards.
- Base object selection emphasizes content similarity over historical order.
- Pack files are immutable—Git always creates new ones instead of modifying existing ones.
Understanding Git's pack files and delta compression deepens your grasp of Git’s sophisticated storage design and its performance characteristics.