[git] Fully backup a git repo?

Is there a simple way to backup an entire git repo including all branches and tags?

This question is related to git backup

The answer is


Expanding on the great answers by KingCrunch and VonC

I combined them both:

git clone --mirror [email protected]/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all

After that you have a file called reponame.bundle that can be easily copied around. You can then create a new normal git repository from that using git clone reponame.bundle reponame.

Note that git bundle only copies commits that lead to some reference (branch or tag) in the repository. So tangling commits are not stored to the bundle.


You can backup the git repo with git-copy at minimum storage size.

git copy /path/to/project /backup/project.repo.backup

Then you can restore your project with git clone

git clone /backup/project.repo.backup project

git bundle

I like that method, as it results in only one file, easier to copy around.
See ProGit: little bundle of joy.
See also "How can I email someone a git repository?", where the command

git bundle create /tmp/foo-all --all

is detailed:

git bundle will only package references that are shown by git show-ref: this includes heads, tags, and remote heads.
It is very important that the basis used be held by the destination.
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.


For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):

git clone /tmp/foo-all newFolder

As far as i know you can just make a copy of the directory your repo is in, that's it!

cp -r project project-backup

use git bundle, or clone

copying the git directory is not a good solution because it is not atomic. If you have a large repository that takes a long time to copy and someone pushes to your repository, it will affect your back up. Cloning or making a bundle will not have this problem.


This thread was very helpful to get some insights how backups of git repos could be done. I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself. Therefore sharing my thoughts here to help others and put them up for discussions to enhance them. Thanks.

So starting with picking-up the original question:

  • Goal is to get as close as possible to a "full" backup of a git repository.

Then enriching it with the typical wishes and specifiying some presettings:

  • Backup via a "hot-copy" is preferred to avoid service downtime.
  • Shortcomings of git will be worked around by additional commands.
  • A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).
  • Additionally a script should do the restore to adapt the dump to the target machine, e.g. even the configuration of the original machine may have changed since the backup.
  • Environment is a git server on a Linux machine with a file system that supports hardlinks.

1. What is a "full" git repo backup?

The point of view differs on what a "100%" backup is. Here are two typical ones.

#1 Developer's point of view

  • Content
  • References

git is a developer tool and supports this point of view via git clone --mirror and git bundle --all.

#2 Admin's point of view

  • Content files
    • Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see git gc)
  • git configuration
  • Optional: OS configuration (file system permissions, etc.)

git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.

2. Techniques

  • "Cold-Copy"
    • Stop the service to have exclusive access to its files. Downtime!
  • "Hot-Copy"
    • Service provides a fixed state for backup purposes. On-going changes do not affect that state.

3. Other topics to think about

Most of them are generic for backups.

  • Is there enough space to hold the full backups? How many generations will be stored?
  • Is an incremental approach wanted? How many generations will be stored and when to create a full backup again?
  • How to verify that a backup is not corrupted after creation or over time?
  • Does the file system support hardlinks?
  • Put backup into a single archive file or use directory structure?

4. What git provides to backup content

  • git gc --auto

    • docs: man git-gc
    • Cleans up and compacts a repository.
  • git bundle --all

    • docs: man git-bundle, man git-rev-list
    • Atomic = "Hot-Copy"
    • Bundles are dump files and can be directly used with git (verify, clone, etc.).
    • Supports incremental extraction.
    • Verifiable via git bundle verify.
  • git clone --mirror

    • docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare
    • Atomic = "Hot-Copy"
    • Mirrors are real git repositories.
    • Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.
    • Supports hardlinks for mirrors on same file system to avoid wasting space.
    • Verifiable via git fsck.
    • Mirrors can be used as a basis for a full file backup script.

5. Cold-Copy

A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.

  • Possible Issues
    • May not be easy - or even possible - to deny all accesses, e.g. shared access via file system.
    • Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(
    • Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.
  • Ideas for Mitigation:
    • Prevent direct repo access via file system in general, even if clients are on the same machine.
    • For SSH/HTTP access use git authorization managers (e.g. gitolite) to dynamically manage access or modify authentication files in a scripted way.
    • Backup repos one-by-one to reduce downtime for each repo. Deny one repo, do backup and allow access again, then continue with the next repo.
    • Have planned maintenance schedule to avoid upset of developers.
    • Only backup when repository has changed. Maybe very hard to implement, e.g. list of objects plus having packfiles in mind, checksums of config and hooks, etc.

6. Hot-Copy

File backups cannot be done with active repos due to risk of corrupted data by on-going commits. A hot-copy provides a fixed state of an active repository for backup purposes. On-going commits do not affect that copy. As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.

"100% admin" hot-copy backup

  • Option 1: use git bundle --all to create full/incremental dump files of content and copy/backup configuration files separately.
  • Option 2: use git clone --mirror, handle and copy configuration separately, then do full file backup of mirror.
    • Notes:
    • A mirror is a new repository, that is populated with the current git template on creation.
    • Clean up configuration files and directories, then copy configuration files from original source repository.
    • Backup script may also apply OS configuration like file permissions on the mirror.
    • Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.

7. Restore

  • Check and adopt git configuration to target machine and latest "way of doing" philosophy.
  • Check and adopt OS configuration to target machine and latest "way of doing" philosophy.

Everything is contained in the .git directory. Just back that up along with your project as you would any file.


cd /path/to/backupdir/
git clone /path/to/repo
cd /path/to/repo
git remote add backup /path/to/backupdir
git push --set-upstream backup master

this creates a backup and makes the setup, so that you can do a git push to update your backup, what is probably what you want to do. Just make sure, that /path/to/backupdir and /path/to/repo are at least different hard drives, otherwise it doesn't make that much sense to do that.


The correct answer IMO is git clone --mirror. This will fully backup your repo.

Git clone mirror will clone the entire repository, notes, heads, refs, etc. and is typically used to copy an entire repository to a new git server. This will pull down an all branches and everything, the entire repository.

git clone --mirror [email protected]/your-repo.git
  • Normally cloning a repo does not include all branches, only Master.

  • Copying the repo folder will only "copy" the branches that have been pulled in...so by default that is Master branch only or other branches you have checked-out previously.

  • The Git bundle command is also not what you want: "The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email to someone or put on a flash drive, then unbundle into another repository." (From What's the difference between git clone --mirror and git clone --bare)


Expanding on some other answers, this is what I do:

Setup the repo: git clone --mirror user@server:/url-to-repo.git

Then when you want to refresh the backup: git remote update from the clone location.

This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).

This is atomic so doesn't have the problems that a simple copy would.

See http://www.garron.me/en/bits/backup-git-bare-repo.html


Here are two options:

  1. You can directly take a tar of the git repo directory as it has the whole bare contents of the repo on server. There is a slight possibility that somebody may be working on repo while taking backup.

  2. The following command will give you the bare clone of repo (just like it is in server), then you can take a tar of the location where you have cloned without any issue.

    git clone --bare {your backup local repo} {new location where you want to clone}
    

If it is on Github, Navigate to bitbucket and use "import repository" method to import your github repo as a private repo.

If it is in bitbucket, Do the otherway around.

It's a full backup but stays in the cloud which is my ideal method.