[git] Finding a branch point with Git?

I have a repository with branches master and A and lots of merge activity between the two. How can I find the commit in my repository when branch A was created based on master?

My repository basically looks like this:

-- X -- A -- B -- C -- D -- F  (master) 
          \     /   \     /
           \   /     \   /
             G -- H -- I -- J  (branch A)

I'm looking for revision A, which is not what git merge-base (--all) finds.

This question is related to git branch

The answer is


After a lot of research and discussions, it's clear there's no magic bullet that would work in all situations, at least not in the current version of Git.

That's why I wrote a couple of patches that add the concept of a tail branch. Each time a branch is created, a pointer to the original point is created too, the tail ref. This ref gets updated every time the branch is rebased.

To find out the branch point of the devel branch, all you have to do is use devel@{tail}, that's it.

https://github.com/felipec/git/commits/fc/tail


I've used git rev-list for this sort of thing. For example, (note the 3 dots)

$ git rev-list --boundary branch-a...master | grep "^-" | cut -c2-

will spit out the branch point. Now, it's not perfect; since you've merged master into branch A a couple of times, that'll split out a couple possible branch points (basically, the original branch point and then each point at which you merged master into branch A). However, it should at least narrow down the possibilities.

I've added that command to my aliases in ~/.gitconfig as:

[alias]
    diverges = !sh -c 'git rev-list --boundary $1...$2 | grep "^-" | cut -c2-'

so I can call it as:

$ git diverges branch-a master

I believe I've found a way that deals with all the corner-cases mentioned here:

branch=branch_A
merge=$(git rev-list --min-parents=2 --grep="Merge.*$branch" --all | tail -1)
git merge-base $merge^1 $merge^2

Charles Bailey is quite right that solutions based on the order of ancestors have only limited value; at the end of the day you need some sort of record of "this commit came from branch X", but such record already exists; by default 'git merge' would use a commit message such as "Merge branch 'branch_A' into master", this tells you that all the commits from the second parent (commit^2) came from 'branch_A' and was merged to the first parent (commit^1), which is 'master'.

Armed with this information you can find the first merge of 'branch_A' (which is when 'branch_A' really came into existence), and find the merge-base, which would be the branch point :)

I've tried with the repositories of Mark Booth and Charles Bailey and the solution works; how couldn't it? The only way this wouldn't work is if you have manually changed the default commit message for merges so that the branch information is truly lost.

For usefulness:

[alias]
    branch-point = !sh -c 'merge=$(git rev-list --min-parents=2 --grep="Merge.*$1" --all | tail -1) && git merge-base $merge^1 $merge^2'

Then you can do 'git branch-point branch_A'.

Enjoy ;)


In general, this is not possible. In a branch history a branch-and-merge before a named branch was branched off and an intermediate branch of two named branches look the same.

In git, branches are just the current names of the tips of sections of history. They don't really have a strong identity.

This isn't usually a big issue as the merge-base (see Greg Hewgill's answer) of two commits is usually much more useful, giving the most recent commit which the two branches shared.

A solution relying on the order of parents of a commit obviously won't work in situations where a branch has been fully integrated at some point in the branch's history.

git commit --allow-empty -m root # actual branch commit
git checkout -b branch_A
git commit --allow-empty -m  "branch_A commit"
git checkout master
git commit --allow-empty -m "More work on master"
git merge -m "Merge branch_A into master" branch_A # identified as branch point
git checkout branch_A
git merge --ff-only master
git commit --allow-empty -m "More work on branch_A"
git checkout master
git commit --allow-empty -m "More work on master"

This technique also falls down if an integration merge has been made with the parents reversed (e.g. a temporary branch was used to perform a test merge into master and then fast-forwarded into the feature branch to build on further).

git commit --allow-empty -m root # actual branch point
git checkout -b branch_A
git commit --allow-empty -m  "branch_A commit"
git checkout master
git commit --allow-empty -m "More work on master"
git merge -m "Merge branch_A into master" branch_A # identified as branch point
git checkout branch_A
git commit --allow-empty -m "More work on branch_A"

git checkout -b tmp-branch master
git merge -m "Merge branch_A into tmp-branch (master copy)" branch_A
git checkout branch_A
git merge --ff-only tmp-branch
git branch -d tmp-branch

git checkout master
git commit --allow-empty -m "More work on master"

Here's an improved version of my previous answer previous answer. It relies on the commit messages from merges to find where the branch was first created.

It works on all the repositories mentioned here, and I've even addressed some tricky ones that spawned on the mailing list. I also wrote tests for this.

find_merge ()
{
    local selection extra
    test "$2" && extra=" into $2"
    git rev-list --min-parents=2 --grep="Merge branch '$1'$extra" --topo-order ${3:---all} | tail -1
}

branch_point ()
{
    local first_merge second_merge merge
    first_merge=$(find_merge $1 "" "$1 $2")
    second_merge=$(find_merge $2 $1 $first_merge)
    merge=${second_merge:-$first_merge}

    if [ "$merge" ]; then
        git merge-base $merge^1 $merge^2
    else
        git merge-base $1 $2
    fi
}

If you like terse commands,

git rev-list $(git rev-list --first-parent ^branch_name master | tail -n1)^^! 

Here's an explanation.

The following command gives you the list of all commits in master that occurred after branch_name was created

git rev-list --first-parent ^branch_name master 

Since you only care about the earliest of those commits you want the last line of the output:

git rev-list ^branch_name --first-parent master | tail -n1

The parent of the earliest commit that's not an ancestor of "branch_name" is, by definition, in "branch_name," and is in "master" since it's an ancestor of something in "master." So you've got the earliest commit that's in both branches.

The command

git rev-list commit^^!

is just a way to show the parent commit reference. You could use

git log -1 commit^

or whatever.

PS: I disagree with the argument that ancestor order is irrelevant. It depends on what you want. For example, in this case

_C1___C2_______ master
  \    \_XXXXX_ branch A (the Xs denote arbitrary cross-overs between master and A)
   \_____/ branch B

it makes perfect sense to output C2 as the "branching" commit. This is when the developer branched out from "master." When he branched, branch "B" wasn't even merged in his branch! This is what the solution in this post gives.

If what you want is the last commit C such that all paths from origin to the last commit on branch "A" go through C, then you want to ignore ancestry order. That's purely topological and gives you an idea of since when you have two versions of the code going at the same time. That's when you'd go with merge-base based approaches, and it will return C1 in my example.


How about something like

git log --pretty=oneline master > 1
git log --pretty=oneline branch_A > 2

git rev-parse `diff 1 2 | tail -1 | cut -c 3-42`^

Sometimes it is effectively impossible (with some exceptions of where you might be lucky to have additional data) and the solutions here wont work.

Git doesn't preserve ref history (which includes branches). It only stores the current position for each branch (the head). This means you can lose some branch history in git over time. Whenever you branch for example, it's immediately lost which branch was the original one. All a branch does is:

git checkout branch1    # refs/branch1 -> commit1
git checkout -b branch2 # branch2 -> commit1

You might assume that the first commited to is the branch. This tends to be the case but it's not always so. There's nothing stopping you from commiting to either branch first after the above operation. Additionally, git timestamps aren't guaranteed to be reliable. It's not until you commit to both that they truly become branches structurally.

While in diagrams we tend to number commits conceptually, git has no real stable concept of sequence when the commit tree branches. In this case you can assume the numbers (indicating order) are determined by timestamp (it might be fun to see how a git UI handles things when you set all the timestamps to the same).

This is what a human expect conceptually:

After branch:
       C1 (B1)
      /
    -
      \
       C1 (B2)
After first commit:
       C1 (B1)
      /
    - 
      \
       C1 - C2 (B2)

This is what you actually get:

After branch:
    - C1 (B1) (B2)
After first commit (human):
    - C1 (B1)
        \
         C2 (B2)
After first commit (real):
    - C1 (B1) - C2 (B2)

You would assume B1 to be the original branch but it could infact simply be a dead branch (someone did checkout -b but never committed to it). It's not until you commit to both that you get a legitimate branch structure within git:

Either:
      / - C2 (B1)
    -- C1
      \ - C3 (B2)
Or:
      / - C3 (B1)
    -- C1
      \ - C2 (B2)

You always know that C1 came before C2 and C3 but you never reliably know if C2 came before C3 or C3 came before C2 (because you can set the time on your workstation to anything for example). B1 and B2 is also misleading as you can't know which branch came first. You can make a very good and usually accurate guess at it in many cases. It is a bit like a race track. All things generally being equal with the cars then you can assume that a car that comes in a lap behind started a lap behind. We also have conventions that are very reliable, for example master will nearly always represent the longest lived branches although sadly I have seen cases where even this is not the case.

The example given here is a history preserving example:

Human:
    - X - A - B - C - D - F (B1)
           \     / \     /
            G - H ----- I - J (B2)
Real:
            B ----- C - D - F (B1)
           /       / \     /
    - X - A       /   \   /
           \     /     \ /
            G - H ----- I - J (B2)

Real here is also misleading because we as humans read it left to right, root to leaf (ref). Git does not do that. Where we do (A->B) in our heads git does (A<-B or B->A). It reads it from ref to root. Refs can be anywhere but tend to be leafs, at least for active branches. A ref points to a commit and commits only contain a like to their parent/s, not to their children. When a commit is a merge commit it will have more than one parent. The first parent is always the original commit that was merged into. The other parents are always commits that were merged into the original commit.

Paths:
    F->(D->(C->(B->(A->X)),(H->(G->(A->X))))),(I->(H->(G->(A->X))),(C->(B->(A->X)),(H->(G->(A->X)))))
    J->(I->(H->(G->(A->X))),(C->(B->(A->X)),(H->(G->(A->X)))))

This is not a very efficient representation, rather an expression of all the paths git can take from each ref (B1 and B2).

Git's internal storage looks more like this (not that A as a parent appears twice):

    F->D,I | D->C | C->B,H | B->A | A->X | J->I | I->H,C | H->G | G->A

If you dump a raw git commit you'll see zero or more parent fields. If there are zero, it means no parent and the commit is a root (you can actually have multiple roots). If there's one, it means there was no merge and it's not a root commit. If there is more than one it means that the commit is the result of a merge and all of the parents after the first are merge commits.

Paths simplified:
    F->(D->C),I | J->I | I->H,C | C->(B->A),H | H->(G->A) | A->X
Paths first parents only:
    F->(D->(C->(B->(A->X)))) | F->D->C->B->A->X
    J->(I->(H->(G->(A->X))) | J->I->H->G->A->X
Or:
    F->D->C | J->I | I->H | C->B->A | H->G->A | A->X
Paths first parents only simplified:
    F->D->C->B->A | J->I->->G->A | A->X
Topological:
    - X - A - B - C - D - F (B1)
           \
            G - H - I - J (B2)

When both hit A their chain will be the same, before that their chain will be entirely different. The first commit another two commits have in common is the common ancestor and from whence they diverged. there might be some confusion here between the terms commit, branch and ref. You can in fact merge a commit. This is what merge really does. A ref simply points to a commit and a branch is nothing more than a ref in the folder .git/refs/heads, the folder location is what determines that a ref is a branch rather than something else such as a tag.

Where you lose history is that merge will do one of two things depending on circumstances.

Consider:

      / - B (B1)
    - A
      \ - C (B2)

In this case a merge in either direction will create a new commit with the first parent as the commit pointed to by the current checked out branch and the second parent as the commit at the tip of the branch you merged into your current branch. It has to create a new commit as both branches have changes since their common ancestor that must be combined.

      / - B - D (B1)
    - A      /
      \ --- C (B2)

At this point D (B1) now has both sets of changes from both branches (itself and B2). However the second branch doesn't have the changes from B1. If you merge the changes from B1 into B2 so that they are syncronised then you might expect something that looks like this (you can force git merge to do it like this however with --no-ff):

Expected:
      / - B - D (B1)
    - A      / \
      \ --- C - E (B2)
Reality:
      / - B - D (B1) (B2)
    - A      /
      \ --- C

You will get that even if B1 has additional commits. As long as there aren't changes in B2 that B1 doesn't have, the two branches will be merged. It does a fast forward which is like a rebase (rebases also eat or linearise history), except unlike a rebase as only one branch has a change set it doesn't have to apply a changeset from one branch on top of that from another.

From:
      / - B - D - E (B1)
    - A      /
      \ --- C (B2)
To:
      / - B - D - E (B1) (B2)
    - A      /
      \ --- C

If you cease work on B1 then things are largely fine for preserving history in the long run. Only B1 (which might be master) will advance typically so the location of B2 in B2's history successfully represents the point that it was merged into B1. This is what git expects you to do, to branch B from A, then you can merge A into B as much as you like as changes accumulate, however when merging B back into A, it's not expected that you will work on B and further. If you carry on working on your branch after fast forward merging it back into the branch you were working on then your erasing B's previous history each time. You're really creating a new branch each time after fast forward commit to source then commit to branch. You end up with when you fast forward commit is lots of branches/merges that you can see in the history and structure but without the ability to determine what the name of that branch was or if what looks like two separate branches is really the same branch.

         0   1   2   3   4 (B1)
        /-\ /-\ /-\ /-\ /
    ----   -   -   -   -
        \-/ \-/ \-/ \-/ \
         5   6   7   8   9 (B2)

1 to 3 and 5 to 8 are structural branches that show up if you follow the history for either 4 or 9. There's no way in git to know which of this unnamed and unreferenced structural branches belong to with of the named and references branches as the end of the structure. You might assume from this drawing that 0 to 4 belongs to B1 and 4 to 9 belongs to B2 but apart from 4 and 9 was can't know which branch belongs to which branch, I've simply drawn it in a way that gives the illusion of that. 0 might belong to B2 and 5 might belong to B1. There are 16 different possibilies in this case of which named branch each of the structural branches could belong to. This is assuming that none of these structural branches came from a deleted branch or as a result of merging a branch into itself when pulling from master (the same branch name on two repos is infact two branches, a separate repository is like branching all branches).

There are a number of git strategies that work around this. You can force git merge to never fast forward and always create a merge branch. A horrible way to preserve branch history is with tags and/or branches (tags are really recommended) according to some convention of your choosing. I realy wouldn't recommend a dummy empty commit in the branch you're merging into. A very common convention is to not merge into an integration branch until you want to genuinely close your branch. This is a practice that people should attempt to adhere to as otherwise you're working around the point of having branches. However in the real world the ideal is not always practical meaning doing the right thing is not viable for every situation. If what you're doing on a branch is isolated that can work but otherwise you might be in a situation where when multiple developers are working one something they need to share their changes quickly (ideally you might really want to be working on one branch but not all situations suit that either and generally two people working on a branch is something you want to avoid).


To find commits from the branching point, you could use this.

git log --ancestry-path master..topicbranch

A simple way to just make it easier to see the branching point in git log --graph is to use the option --first-parent.

For example, take the repo from the accepted answer:

$ git log --all --oneline --decorate --graph

*   a9546a2 (HEAD -> master, origin/master, origin/HEAD) merge from topic back to master
|\  
| *   648ca35 (origin/topic) merging master onto topic
| |\  
| * | 132ee2a first commit on topic branch
* | | e7c863d commit on master after master was merged to topic
| |/  
|/|   
* | 37ad159 post-branch commit on master
|/  
* 6aafd7f second commit on master before branching
* 4112403 initial commit on master

Now add --first-parent:

$ git log --all --oneline --decorate --graph --first-parent

* a9546a2 (HEAD -> master, origin/master, origin/HEAD) merge from topic back to master
| * 648ca35 (origin/topic) merging master onto topic
| * 132ee2a first commit on topic branch
* | e7c863d commit on master after master was merged to topic
* | 37ad159 post-branch commit on master
|/  
* 6aafd7f second commit on master before branching
* 4112403 initial commit on master

That makes it easier!

Note if the repo has lots of branches you're going to want to specify the 2 branches you're comparing instead of using --all:

$ git log --decorate --oneline --graph --first-parent master origin/topic

The following command will reveal the SHA1 of Commit A

git merge-base --fork-point A


I seem to be getting some joy with

git rev-list branch...master

The last line you get is the first commit on the branch, so then it's a matter of getting the parent of that. So

git rev-list -1 `git rev-list branch...master | tail -1`^

Seems to work for me and doesn't need diffs and so on (which is helpful as we don't have that version of diff)

Correction: This doesn't work if you are on the master branch, but I'm doing this in a script so that's less of an issue


You can examine the reflog of branch A to find from which commit it was created, as well as the full history of which commits that branch pointed to. Reflogs are in .git/logs.


You could use the following command to return the oldest commit in branch_a, which is not reachable from master:

git rev-list branch_a ^master | tail -1

Perhaps with an additional sanity check that the parent of that commit is actually reachable from master...


I recently needed to solve this problem as well and ended up writing a Ruby script for this: https://github.com/vaneyckt/git-find-branching-point


surely I'm missing something, but IMO, all the problems above are caused because we are always trying to find the branch point going back in the history, and that causes all sort of problems because of the merging combinations available.

Instead, I've followed a different approach, based in the fact that both branches share a lot of history, exactly all the history before branching is 100% the same, so instead of going back, my proposal is about going forward (from 1st commit), looking for the 1st difference in both branches. The branch point will be, simply, the parent of the first difference found.

In practice:

#!/bin/bash
diff <( git rev-list "${1:-master}" --reverse --topo-order ) \
     <( git rev-list "${2:-HEAD}" --reverse --topo-order) \
--unified=1 | sed -ne 's/^ //p' | head -1

And it's solving all my usual cases. Sure there are border ones not covered but... ciao :-)


The problem appears to be to find the most recent, single-commit cut between both branches on one side, and the earliest common ancestor on the other (probably the initial commit of the repo). This matches my intuition of what the "branching off" point is.

That in mind, this is not at all easy to compute with normal git shell commands, since git rev-list -- our most powerful tool -- doesn't let us restrict the path by which a commit is reached. The closest we have is git rev-list --boundary, which can give us a set of all the commits that "blocked our way". (Note: git rev-list --ancestry-path is interesting but I don't how to make it useful here.)

Here is the script: https://gist.github.com/abortz/d464c88923c520b79e3d. It's relatively simple, but due to a loop it's complicated enough to warrant a gist.

Note that most other solutions proposed here can't possibly work in all situations for a simple reason: git rev-list --first-parent isn't reliable in linearizing history because there can be merges with either ordering.

git rev-list --topo-order, on the other hand, is very useful -- for walking commits in topographic order -- but doing diffs is brittle: there are multiple possible topographic orderings for a given graph, so you are depending on a certain stability of the orderings. That said, strongk7's solution probably works damn well most of the time. However it's slower that mine as a result of having to walk the entire history of the repo... twice. :-)


Given that so many of the answers in this thread do not give the answer the question was asking for, here is a summary of the results of each solution, along with the script I used to replicate the repository given in the question.

The log

Creating a repository with the structure given, we get the git log of:

$ git --no-pager log --graph --oneline --all --decorate
* b80b645 (HEAD, branch_A) J - Work in branch_A branch
| *   3bd4054 (master) F - Merge branch_A into branch master
| |\  
| |/  
|/|   
* |   a06711b I - Merge master into branch_A
|\ \  
* | | bcad6a3 H - Work in branch_A
| | * b46632a D - Work in branch master
| |/  
| *   413851d C - Merge branch_A into branch master
| |\  
| |/  
|/|   
* | 6e343aa G - Work in branch_A
| * 89655bb B - Work in branch master
|/  
* 74c6405 (tag: branch_A_tag) A - Work in branch master
* 7a1c939 X - Work in branch master

My only addition, is the tag which makes it explicit about the point at which we created the branch and thus the commit we wish to find.

The solution which works

The only solution which works is the one provided by lindes correctly returns A:

$ diff -u <(git rev-list --first-parent branch_A) \
          <(git rev-list --first-parent master) | \
      sed -ne 's/^ //p' | head -1
74c6405d17e319bd0c07c690ed876d65d89618d5

As Charles Bailey points out though, this solution is very brittle.

If you branch_A into master and then merge master into branch_A without intervening commits then lindes' solution only gives you the most recent first divergance.

That means that for my workflow, I think I'm going to have to stick with tagging the branch point of long running branches, since I can't guarantee that they can be reliably be found later.

This really all boils down to gits lack of what hg calls named branches. The blogger jhw calls these lineages vs. families in his article Why I Like Mercurial More Than Git and his follow-up article More On Mercurial vs. Git (with Graphs!). I would recommend people read them to see why some mercurial converts miss not having named branches in git.

The solutions which don't work

The solution provided by mipadi returns two answers, I and C:

$ git rev-list --boundary branch_A...master | grep ^- | cut -c2-
a06711b55cf7275e8c3c843748daaa0aa75aef54
413851dfecab2718a3692a4bba13b50b81e36afc

The solution provided by Greg Hewgill return I

$ git merge-base master branch_A
a06711b55cf7275e8c3c843748daaa0aa75aef54
$ git merge-base --all master branch_A
a06711b55cf7275e8c3c843748daaa0aa75aef54

The solution provided by Karl returns X:

$ diff -u <(git log --pretty=oneline branch_A) \
          <(git log --pretty=oneline master) | \
       tail -1 | cut -c 2-42
7a1c939ec325515acfccb79040b2e4e1c3e7bbe5

The script

mkdir $1
cd $1
git init
git commit --allow-empty -m "X - Work in branch master"
git commit --allow-empty -m "A - Work in branch master"
git branch branch_A
git tag branch_A_tag     -m "Tag branch point of branch_A"
git commit --allow-empty -m "B - Work in branch master"
git checkout branch_A
git commit --allow-empty -m "G - Work in branch_A"
git checkout master
git merge branch_A       -m "C - Merge branch_A into branch master"
git checkout branch_A
git commit --allow-empty -m "H - Work in branch_A"
git merge master         -m "I - Merge master into branch_A"
git checkout master
git commit --allow-empty -m "D - Work in branch master"
git merge branch_A       -m "F - Merge branch_A into branch master"
git checkout branch_A
git commit --allow-empty -m "J - Work in branch_A branch"

I doubt the git version makes much difference to this, but:

$ git --version
git version 1.7.1

Thanks to Charles Bailey for showing me a more compact way to script the example repository.


The following implements git equivalent of svn log --stop-on-copy and can also be used to find branch origin.

Approach

  1. Get head for all branches
  2. collect mergeBase for target branch each other branch
  3. git.log and iterate
  4. Stop at first commit that appears in the mergeBase list

Like all rivers run to the sea, all branches run to master and therefore we find merge-base between seemingly unrelated branches. As we walk back from branch head through ancestors, we can stop at the first potential merge base since in theory it should be origin point of this branch.

Notes

  • I haven't tried this approach where sibling and cousin branches merged between each other.
  • I know there must be a better solution.

details: https://stackoverflow.com/a/35353202/9950


You may be looking for git merge-base:

git merge-base finds best common ancestor(s) between two commits to use in a three-way merge. One common ancestor is better than another common ancestor if the latter is an ancestor of the former. A common ancestor that does not have any better common ancestor is a best common ancestor, i.e. a merge base. Note that there can be more than one merge base for a pair of commits.


Not quite a solution to the question but I thought it was worth noting the the approach I use when I have a long-living branch:

At the same time I create the branch, I also create a tag with the same name but with an -init suffix, for example feature-branch and feature-branch-init.

(It is kind of bizarre that this is such a hard question to answer!)