(The best answer I've seen to this problem is: https://stackoverflow.com/a/42544963/714112 , copied here since this thread appears high in Google search rankings but that other one doesn't)
This shell script displays all blob objects in the repository, sorted from smallest to largest.
For my sample repo, it ran about 100 times faster than the other ones found here.
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5,622,155 objects in just over a minute.
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '/^blob/ {print substr($0,6)}' \
| sort --numeric-sort --key=2 \
| cut --complement --characters=13-40 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
When you run above code, you will get nice human-readable output like this:
...
0d99bb931299 530KiB path/to/some-image.jpg
2ba44098e28f 12MiB path/to/hires-image.png
bd1741ddce0d 63MiB path/to/some-video-1080p.mp4
Suppose you then want to remove the files a
and b
from every commit reachable from HEAD
, you can use this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' HEAD