How to remove delete a large file from commit history in Git repository

Question

I accidentally dropped a DVD-rip into a website project  then carelessly git commit -a -m      and  zap  the repo was bloated by 2 2 gigs  Next time I made some edits  deleted the video file  and committed everything  but the compressed file is still there in the repository  in history  I know I can start branches from those commits and rebase one branch onto another  But what should I do to merge together the 2 commits so that the big file doesn t show in the history and is cleaned in garbage collection procedure

User · Answer

Why not use this simple but powerful command?

git filter-branch --tree-filter 'rm -f DVD-rip' HEAD

The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.

If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet. This comment courtesy of @alpha_989 seems too important to leave out here.

See this link.

User · Answer

git filter-branch is a powerful command which you can use it to delete a huge file from the commits history  The file will stay for a while and Git will remove it in the next garbage collection  Below is the full process from deleteing files from commit history  For safety  below process runs the commands on a new branch first  If the result is what you needed  then reset it back to the branch you actually want to change     Do it in a new testing branch   git checkout -b test    Remove file-name from every commit on the new branch   --index-filter  rewrite index without checking out   --cached  remove it from index but not include working tree   --ignore-unmatch  ignore if files to be removed are absent in a commit   HEAD  execute the specified command for each commit reached from HEAD by parent link   git filter-branch --index-filter  git rm --cached --ignore-unmatch file-name  HEAD    The output is OK  reset it to the prior branch master   git checkout master   git reset --soft test    Remove test branch   git branch -d test    Push it with force   git push --force origin master

User · Answer

When you run into this problem  git rm will not suffice  as git remembers that the file existed once in our history  and thus will keep a reference to it   To make things worse  rebasing is not easy either  because any references to the blob will prevent git garbage collector from cleaning up the space  This includes remote references and reflog references   I put together git forget-blob  a little script that tries removing all these references  and then uses git filter-branch to rewrite every commit in the branch   Once your blob is completely unreferenced  git gc will get rid of it  The usage is pretty simple git forget-blob file-to-forget  You can get more info here  https   ownyourbits com 2017 01 18 completely-remove-a-file-from-a-git-repository-with-git-forget-blob   I put this together thanks to the answers from Stack Overflow and some blog entries  Credits to them

User · Answer

I basically did what was on this answer  https   stackoverflow com a 11032521 1286423   for history  I ll copy-paste it here     git filter-branch --index-filter  git rm -rf --cached --ignore-unmatch YOURFILENAME  HEAD   rm -rf  git refs original     git reflog expire --all    git gc --aggressive --prune   git push origin master --force   It didn t work  because I like to rename and move things a lot  So some big file were in folders that have been renamed  and I think the gc couldn t delete the reference to those files because of reference in tree objects pointing to those file  My ultimate solution to really kill it was to     First  apply what s in the answer linked in the front   and before doing the gc --prune --aggressive  do     Go back at the origin of the repository git checkout -b newinit  lt sha1 of first commit gt    Create a parallel initial commit git commit --amend   go back on the master branch that has big file   still referenced in history  even though    we thought we removed them  git checkout master   rebase on the newinit created earlier  By reapply patches    it will really forget about the references to hidden big files  git rebase newinit    Do the previous part  checkout   rebase  for each branch   still connected to the original initial commit     so we remove all the references     Remove the  git logs folder  also containing references   to commits that could make git gc not remove them  rm -rf  git logs     Then you can do a garbage collection    and the hidden files really will get gc ed git gc --prune --aggressive   My repo  the  git  changed from 32MB to 388KB  that even filter-branch couldn t clean

User · Answer

Use the BFG Repo-Cleaner  a simpler  faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history   Carefully follow the usage instructions  the core part is just this     java -jar bfg jar --strip-blobs-bigger-than 100M my-repo git   Any files over 100MB in size  that aren t in your latest commit  will be removed from your Git repository s history  You can then use git gc to clean away the dead data     git gc --prune now --aggressive   The BFG is typically at least 10-50x faster than running git-filter-branch  and generally easier to use   Full disclosure  I m the author of the BFG Repo-Cleaner

User · Answer

After trying virtually every answer in SO  I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again  http   www zyxware com articles 4027 how-to-delete-files-permanently-from-your-local-and-remote-git-repositories  CD to your local working folder and run the following command   git filter-branch -f --index-filter  git rm -rf --cached --ignore-unmatch FOLDERNAME  -- --all   replace FOLDERNAME with the file or folder you wish to remove from the given git repository   Once this is done run the following commands to clean up the local repository   rm -rf  git refs original  git reflog expire --expire now --all git gc --prune now git gc --aggressive --prune now   Now push all the changes to the remote repository   git push --all --force   This will clean up the remote repository

User · Answer

This works perfectly for me   in git extensions   right click on the selected commit   reset current branch to here   hard reset   It s surprising nobody else is able to give this simple answer

User · Answer

The best answer I ve seen to this problem is  https   stackoverflow com a 42544963 714112   copied here since this thread appears high in Google search rankings but that other one doesn t    A blazingly fast shell one-liner   This shell script displays all blob objects in the repository  sorted from smallest to largest   For my sample repo  it ran about 100 times faster than the other ones found here  On my trusty Athlon II X4 system  it handles the Linux Kernel repository with its 5 622 155 objects in just over a minute   The Base Script  git rev-list --objects --all     git cat-file --batch-check    objecttype    objectname    objectsize    rest       awk    blob   print substr  0 6        sort --numeric-sort --key 2     cut --complement --characters 13-40     numfmt --field 2 --to iec-i --suffix B --padding 7 --round nearest   When you run above code  you will get nice human-readable output like this       0d99bb931299  530KiB path to some-image jpg 2ba44098e28f   12MiB path to hires-image png bd1741ddce0d   63MiB path to some-video-1080p mp4    Fast File Removal   Suppose you then want to remove the files a and b from every commit reachable from HEAD  you can use this command   git filter-branch --index-filter  git rm --cached --ignore-unmatch a b  HEAD

User · Answer

You can do this using the branch filter command   git filter-branch --tree-filter  rm -rf path to your file  HEAD

User · Answer

What you want to do is highly disruptive if you have published history to other developers  See    Recovering From Upstream Rebase    in the git rebase documentation for the necessary steps after repairing your history   You have at least two options  git filter-branch and an interactive rebase  both explained below   Using git filter-branch  I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository   Say your git history is     git lola --name-status   f772d66  HEAD  master  Login page   A     login html   cb14efd Remove DVD-rip   D     oops iso   ce36c98 Careless   A     oops iso   A     other html   5af4522 Admin page   A     admin html   e738b63 Index   A     index html   Note that git lola is a non-standard but highly useful alias  With the --name-status switch  we can see tree modifications associated with each commit   In the    Careless    commit  whose SHA1 object name is ce36c98  the file oops iso is the DVD-rip added by accident and removed in the next commit  cb14efd  Using the technique described in the aforementioned blog post  the command to execute is   git filter-branch --prune-empty -d  dev shm scratch     --index-filter  git rm --cached -f --ignore-unmatch oops iso      --tag-name-filter cat -- --all   Options    --prune-empty removes commits that become empty  i e   do not change the tree  as a result of the filter operation  In the typical case  this option produces a cleaner history  -d names a temporary directory that does not yet exist to use for building the filtered history  If you are running on a modern Linux distribution  specifying a tree in  dev shm will result in faster execution  --index-filter is the main event and runs against the index at each step in the history  You want to remove oops iso wherever it is found  but it isn   t present in all commits  The command git rm --cached -f --ignore-unmatch oops iso deletes the DVD-rip when it is present and does not fail otherwise  --tag-name-filter describes how to rewrite tag names  A filter of cat is the identity operation  Your repository  like the sample above  may not have any tags  but I included this option for full generality  -- specifies the end of options to git filter-branch --all following -- is shorthand for all refs  Your repository  like the sample above  may have only one ref  master   but I included this option for full generality    After some churning  the history is now     git lola --name-status   8e0a11c  HEAD  master  Login page   A     login html   e45ac59 Careless   A     other html       f772d66  refs original refs heads master  Login page     A   login html     cb14efd Remove DVD-rip     D   oops iso     ce36c98 Careless     A   oops iso     A   other html     5af4522 Admin page   A     admin html   e738b63 Index   A     index html   Notice that the new    Careless    commit adds only other html and that the    Remove DVD-rip    commit is no longer on the master branch  The branch labeled refs original refs heads master contains your original commits in case you made a mistake  To remove it  follow the steps in    Checklist for Shrinking a Repository        git update-ref -d refs original refs heads master   git reflog expire --expire now --all   git gc --prune now   For a simpler alternative  clone the repository to discard the unwanted bits     cd   src   mv repo repo old   git clone file    home user src repo old repo   Using a file        clone URL copies objects rather than creating hardlinks only   Now your history is     git lola --name-status   8e0a11c  HEAD  master  Login page   A     login html   e45ac59 Careless   A     other html   5af4522 Admin page   A     admin html   e738b63 Index   A     index html   The SHA1 object names for the first two commits     Index    and    Admin page     stayed the same because the filter operation did not modify those commits     Careless    lost oops iso and    Login page    got a new parent  so their SHA1s did change   Interactive rebase  With a history of     git lola --name-status   f772d66  HEAD  master  Login page   A     login html   cb14efd Remove DVD-rip   D     oops iso   ce36c98 Careless   A     oops iso   A     other html   5af4522 Admin page   A     admin html   e738b63 Index   A     index html   you want to remove oops iso from    Careless    as though you never added it  and then    Remove DVD-rip    is useless to you  Thus  our plan going into an interactive rebase is to keep    Admin page     edit    Careless     and discard    Remove DVD-rip      Running   git rebase -i 5af4522 starts an editor with the following contents   pick ce36c98 Careless pick cb14efd Remove DVD-rip pick f772d66 Login page    Rebase 5af4522  f772d66 onto 5af4522     Commands     p  pick   use commit    r  reword   use commit  but edit the commit message    e  edit   use commit  but stop for amending    s  squash   use commit  but meld into previous commit    f  fixup   like  squash   but discard this commit s log message    x  exec   run command  the rest of the line  using shell     If you remove a line here THAT COMMIT WILL BE LOST    However  if you remove everything  the rebase will be aborted      Executing our plan  we modify it to  edit ce36c98 Careless pick f772d66 Login page    Rebase 5af4522  f772d66 onto 5af4522         That is  we delete the line with    Remove DVD-rip    and change the operation on    Careless    to be edit rather than pick   Save-quitting the editor drops us at a command prompt with the following message   Stopped at ce36c98    Careless You can amend the commit now  with          git commit --amend  Once you are satisfied with your changes  run          git rebase --continue   As the message tells us  we are on the    Careless    commit we want to edit  so we run two commands     git rm --cached oops iso   git commit --amend -C HEAD   git rebase --continue   The first removes the offending file from the index  The second modifies or amends    Careless    to be the updated index and -C HEAD instructs git to reuse the old commit message  Finally  git rebase --continue goes ahead with the rest of the rebase operation   This gives a history of     git lola --name-status   93174be  HEAD  master  Login page   A     login html   a570198 Careless   A     other html   5af4522 Admin page   A     admin html   e738b63 Index   A     index html   which is what you want

User · Answer

git filter-branch --tree-filter  rm -f path to file  HEAD  worked pretty well for me  although I ran into the same problem as described here  which I solved by following this suggestion   The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch Removing a File from Every Commit section

User · Answer

Just note that this commands can be very destructive  If more people are working on the repo they ll all have to pull the new tree  The three middle commands are not necessary if your goal is NOT to reduce the size  Because the filter branch creates a backup of the removed file and it can stay there for a long time      git filter-branch --index-filter  git rm -rf --cached --ignore-unmatch YOURFILENAME  HEAD   rm -rf  git refs original     git reflog expire --all    git gc --aggressive --prune   git push origin master --force

User · Answer

Other than git filter-branch  slow but pure git solution  and BFG  easier and very performant   there is also another tool to filter with good performance   https   github com xoofx git-rocket-filter  From its description   The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features    Fast rewriting of commits and trees  by an order of x10 to x100   Built-in support for both white-listing with --keep  keeps files or directories  and black-listing with --remove options  Use of  gitignore like pattern for tree-filtering Fast and easy C  Scripting for both commit filtering and tree filtering Support for scripting in tree-filtering per file directory pattern Automatically prune empty unchanged commit  including merge commits

User · Answer

This will remove it from your history  git filter-branch --force --index-filter  git rm -r --cached --ignore-unmatch bigfile txt  --prune-empty --tag-name-filter cat -- --all

User · Answer

According to GitHub Documentation  just follow these steps   Get rid of the large file  Option 1  You don t want to keep the large file  rm path to your large file          delete the large file  Option 2  You want to keep the large file into an untracked directory mkdir large files                         create directory large files touch  gitignore                          create  gitignore file if needed   large files    gt  gt   gitignore             untrack directory large files mv path to your large file large files    move the large file into the untracked directory   Save your changes  git add path to your large file     add the deletion to the index git commit -m  delete large file    commit the deletion   Remove the large file from all commits  git filter-branch --force --index-filter      quot git rm --cached --ignore-unmatch path to your large file quot      --prune-empty --tag-name-filter cat -- --all git push  lt remote gt   lt branch gt

User · Answer

100 times faster than git filter-branch and simpler There are very good answers in this thread  but meanwhile many of them are outdated  Using git-filter-branch is no longer recommended  because it is difficult to use and awfully slow on big repositories  git-filter-repo is much faster and simpler to use  git-filter-repo is a Python script  available at github  https   github com newren git-filter-repo   When installed it looks like a regular git command and can be called by git filter-repo  You need only one file  the Python3 script git-filter-repo  Copy it to a path that is included in the PATH variable  On Windows you may have to change the first line of the script  refer INSTALL md   You need Python3 installed installed on your system  but this is not a big deal  First you can run git filter-repo --analyze  This helps you to determine what to do next  You can delete your DVD-rip file everywhere  git filter-repo --invert-paths --path-match DVD-rip    Filter-repo is really fast  A task that took around 9 hours on my computer by filter-branch  was completed in 4 minutes by filter-repo  You can do many more nice things with filter-repo  Refer to the documentation for that  Warning  Do this on a copy of your repository  Many actions of filter-repo cannot be undone  filter-repo will change the commit hashes of all modified commits  of course  and all their descendants down to the last commits

User · Answer

If you know your commit was recent instead of going through the entire tree do the following   git filter-branch --tree-filter  rm LARGE FILE zip  HEAD 10  HEAD

User · Answer

I ran into this with a bitbucket account  where I had accidentally stored ginormous   jpa backups of my site    git filter-branch --prune-empty --index-filter  git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE  --tag-name-filter cat -- --all  Relpace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history  including tags     source  https   web archive org web 20170727144429 http   naleid com 80 blog 2012 01 17 finding-and-purging-big-files-from-git-history

User · Answer

Use Git Extensions  it s a UI tool  It has a plugin named  Find large files  which finds lage files in repositories and allow removing them permenently   Don t use  git filter-branch  before using this tool  since it won t be able to find files removed by  filter-branch   Altough  filter-branch  does not remove files completely from the repository pack files

User · Answer

These commands worked in my case   git filter-branch --force --index-filter  git rm --cached -r --ignore-unmatch oops iso  --prune-empty --tag-name-filter cat -- --all rm -rf  git refs original  git reflog expire --expire now --all git gc --prune now git gc --aggressive --prune now   It is little different from the above versions   For those who need to push this to github bitbucket  I only tested this with bitbucket      WARNING      this will rewrite completely your bitbucket refs   will delete all branches that you didn t have in your local  git push --all --prune --force    Once you pushed  all your teammates need to clone repository again   git pull will not work

[git] How to remove/delete a large file from commit history in Git repository?

Examples related to git

Examples related to version-control

Examples related to git-rebase

Examples related to git-rewrite-history