I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).
I know I can add these filenames to .gitignore, but this would not remove their history within Git.
I also don't want to start over again by deleting the /.git directory.
Is there a way to remove all traces of a particular file in your Git history?
This question is related to
git
git-commit
git-filter-branch
git-rewrite-history
To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.
An alternative would be:
This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.
Call this the nuclear option.
Use filter-branch:
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all
git push origin *branch_name* -f
I recommend this script by David Underhill, worked like a charm for me.
It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
Full script (all credit to David Underhill)
#!/bin/bash
set -o errexit
# Author: David Underhill
# Script to permanently delete files/folders from your git repository. To use
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
if [ $# -eq 0 ]; then
exit 0
fi
# make sure we're at the root of git repo
if [ ! -d .git ]; then
echo "Error: must run this script from the root of a git repository"
exit 1
fi
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $files" HEAD
# remove the temporary history git-filter-branch
# otherwise leaves behind for a long time
rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune
The last two commands may work better if changed to the following:
git reflog expire --expire=now --all && \
git gc --aggressive --prune=now
I've had to do this a few times to-date. Note that this only works on 1 file at a time.
Get a list of all commits that modified a file. The one at the bottom will the the first commit:
git log --pretty=oneline --branches -- pathToFile
To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..
Here is my solution in windows
git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD
git push --force
make sure that the path is correct otherwise it won't work
I hope it helps
In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder. To remove this sensitive file I used below script and worked perfectly.
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch app/src/main/res/values/admob_keys.xml' \
--prune-empty --tag-name-filter cat -- --all
Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch
explicitly designed for removing private data from Git repos.
Create a private.txt
file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:
$ java -jar bfg.jar --replace-text private.txt my-repo.git
All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc
to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is typically 10-50x faster than running git-filter-branch
and the options are simplified and tailored around these two common use-cases:
Full disclosure: I'm the author of the BFG Repo-Cleaner.
You can use git forget-blob
.
The usage is pretty simple git forget-blob file-to-forget
. You can get more info here
It will disappear from all the commits in your history, reflog, tags and so on
I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.
Credits to contributors from Stack Overflow that allowed me to put this together
So, It looks something like this:
git rm --cached /config/deploy.rb
echo /config/deploy.rb >> .gitignore
Remove cache for tracked file from git and add that file to
.gitignore
list
If you pushed to GitHub, force pushing is not enough, delete the repository or contact support
Even if you force push one second afterwards, it is not enough as explained below.
The only valid courses of action are:
is what leaked a changeable credential like a password?
yes: modify your passwords immediately, and consider using more OAuth and API keys!
no (naked pics):
do you care if all issues in the repository get nuked?
no: delete the repository
yes:
Force pushing a second later is not enough because:
GitHub keeps dangling commits for a long time.
GitHub staff does have the power to delete such dangling commits if you contact them however.
I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc
. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.
Dangling commits can be seen either through:
One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip
It is possible to fetch the missing SHAs either by:
type": "PushEvent"
. E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.
I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.
If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.
To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:
git init
git remote add origin [email protected]:cirosantilli/test-dangling.git
touch a
git add .
git commit -m 0
git push
touch b
git add .
git commit -m 1
git push
touch c
git rm b
git add .
git commit --amend --no-edit
git push -f
See also: How to remove a dangling commit from GitHub?
git filter-repo
is now officially recommended over git filter-branch
This is mentioned in the manpage of git filter-branch
in Git 2.5 itself.
With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history
pip install git-filter-repo
git filter-repo --path path/to/remove1 --path path/to/remove2 --invert-paths
This automatically removes empty commits.
Or you can replace certain strings with: How to replace a string in a whole Git history?
git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')
Source: Stackoverflow.com