git revisions

08 Nov 2020 | categories: blog

prev: git: ours and theirs | next: how SSL/TLS works

The last few of my posts have been about Git and this one is no different, I want to shift gears though and instead of talking about specific git commands and their behaviour I would like to talk about gitrevisions. Gitrevisions are the arguments you pass to commands to specify either…

Which command you are using will make a difference to which of the above you will want to work with. I say commit here but there are commands that will work with other objects like blobs and trees. If you don’t know what blobs or trees are then check out my post on How Git Works.

Right, enough waffling, there is a lot to get through so let’s dive in.

revisions

Here is a small repo I threw together so I can show you examples of these revisions. There are three branches: master, dev are both local branches and there is one remote branch origin/master.

For the most part HEAD will be on commit F in the following examples.

*   3b3d402 (HEAD -> master, origin/master) F
|\
| * 69fdef7 (dev) D
| * 30a4fb3 C
* | dd81b17 E
|/
* 6bfe3ba B
* 32f3315 A

<sha1>

This is the one people are very familiar with, passing the SHA1 hash of a commit object to a command.

> git --no-pager show --oneline 69fdef7
69fdef7 (dev) D
diff --git a/file b/file
index b1e6722..8422d40 100644
--- a/file
+++ b/file
@@ -1,3 +1,4 @@
 A
 B
 C
+D

There isn’t much else to talk about with this one other than you can pass the entire hash for the commit, in this case 69fdef755908451625c665fe1b8fa06ecec718f9, but passing just a few characters will be enough providing the value is unique within the repo, git needs at least 4 characters to work with i.e. 69fd would work.

<refname>

A refname is a symbolic name that points to a commit, for example the name of your branch. This is a very common way people reference commit objects, for example master points to the commit F which is the head of that branch. When passed as a gitrevision to a command master in this instance would be taken by git to mean refs/heads/master.

master, heads/master, and refs/heads/master all refer to the same thing.

> git rev-parse master
3b3d4022eb580f9ace87e5bf4ed35e627fd94289

> git rev-parse heads/master
3b3d4022eb580f9ace87e5bf4ed35e627fd94289

> git rev-parse refs/heads/master
3b3d4022eb580f9ace87e5bf4ed35e627fd94289

The refs are all found in .git/refs:

> ls -l .git/refs
total 12
drwxr-x--- 2 skip skip 4096 Nov  7 07:20 heads
drwxr-x--- 3 skip skip 4096 Nov  7 07:28 remotes
drwxr-x--- 2 skip skip 4096 Nov  7 07:17 tags

It stands to reason then that we could write refs/remotes/<refname> to have a git command work on a remote branch. In my small repo I’ve set up a remote called origin and I have master tracking it.

> ls -l .git/refs/remotes/origin/
total 4
-rw-r----- 1 skip skip 41 Nov  7 07:28 master

Let’s see use git show to look at the remote ref:

> git --no-pager show --oneline refs/remotes/origin/master
3b3d402 (HEAD -> master, origin/master) F

diff --cc file
index 72f43f8,8422d40..8fda00d
--- a/file
+++ b/file
@@@ -1,3 -1,4 +1,5 @@@
  A
  B
+ C
+ D
 +E

I don’t have any tags in this repo but we could also use refs/tags/<refname> to look at a commit that is annotated with a tag.

When we specified master earlier git assumed we meant the heads version, and we’ve seen that we can be more specific with our ref if we want to look at a remote or a tag. How does git deal with ambiguous references? After all there are two master branches, my local one and the remote one. Rules of course! It will go through this rule set and show whichever matches first.

Simple really. Let’s move onto more obscure ways to reference a commit.

@

This one is really basic, @ just means HEAD, it is used to build more complex references though so remember this one.

> git rev-parse @
3b3d4022eb580f9ace87e5bf4ed35e627fd94289

> git rev-parse HEAD
3b3d4022eb580f9ace87e5bf4ed35e627fd94289

[<refname>]@{<date>}

This is probably one of the cooler things I’ve learned about git in the past year, git works really well with temporal strings! You can pass things like 5 minutes ago or last year in as <date> and git will find what you want. The only stipulation is there needs to be some sort of history for <refname> in .git/logs.

For example dev@{'10 minutes ago'} will find the closest commit it can find from 10 minutes ago. Notice I have to quote the temporal string otherwise my shell complains.

> git --no-pager log --oneline -1 dev@{'10 minutes ago'}
69fdef7 (dev) D

If I ask for a date or time that git doesn’t have logs for it’ll complain but return the oldest entry it can find.

> git --no-pager log --oneline -1 dev@{'25 years ago'}
warning: Log for 'dev' only goes back to Sat, 7 Nov 2020 07:18:05 +0000.
6bfe3ba B

And leaving off the <refname> git will assume HEAD.

> git --no-pager log --oneline -1 @{'10 minutes ago'}
3b3d402 (HEAD -> master, origin/master) F

<refname>@{<n>}

This version allows you to get the n-th prior value for <refname>.

> git --no-pager log --oneline -1 dev
69fdef7 (dev) D

> git --no-pager log --oneline -1 dev@{1}
30a4fb3 C

Like [<refname>]@{<date>} there needs to be a log for this reference in .git/logs. And like the previous syntax you can leave off <refname> if you want to reference the current branch. For example with master checked out @{1} finds first prior value for master.

@{-<n>}

This syntax allows you to reference the n-th previous branch or commit checkout before the current one. As an example we are currently on master and before that I was in dev making some changes so @{-1} should show us the tip of dev:

> git --no-pager log -1 --oneline @{-1}
69fdef7 (dev) D

[<branchname>]@{upstream}

Now we are getting into the funky commands that you’ll no doubt remember exist in future but need to look up the syntax for if you ever need to use them because I sure as hell do (man gitrevisions is very handy).

This syntax allows you to reference the upstream branch for <branchname> and if you leave off <branchname> then the current branch is used. In other words you can reference the branch that <branchname> is building off.

> git --no-pager log -1 --oneline @{upstream}
3b3d402 (HEAD -> master, origin/master) F

The master branch has origin/master as its upstream branch and origin/master happens to be pointing to the same commit as master. To make this example crystal clear I will create a new commit on our local master branch:

> git --no-pager log --oneline --graph

* 93ffd00 (HEAD -> master) G
*   3b3d402 (origin/master) F
|\
| * 69fdef7 (dev) D
| * 30a4fb3 C
* | dd81b17 E
|/
* 6bfe3ba B
* 32f3315 A

Running the same command as before the result should still show origin/master pointing to commit F:

> git --no-pager log -1 --oneline @{u}
3b3d402 (origin/master) F

I guess it’s a good time to mention that u is accepted too, and the keyword is case insensitive so all of the following are fair game.

[<branchname>]@{push}

The previous command let you see the upstream so this one naturally lets us see what the push branch is up to. For the majority of people the push branch will probably be the same as upstream.

> git --no-pager log -1 --oneline @{push}
3b3d402 (origin/master) F

If we perhaps changed the repo so master was pulling from origin (i.e. our upstream) but we pushed to a fork of the project, that is when this would be useful. This is what’s known as a triangular workflow.

<rev>^[<n>]

This is where things get interesting, the ^ character allows you to specify the n-th parent. To explain this we need a good idea of the git history of our repo, so I’ll repost it here.

* 93ffd00 (HEAD -> master) G
*   3b3d402 (origin/master) F
|\
| * 69fdef7 (dev) D
| * 30a4fb3 C
* | dd81b17 E
|/
* 6bfe3ba B
* 32f3315 A

Most commits in this history have a single parent, for example B has A as its parent.

> git --no-pager log -1 --oneline 6bfe3ba^
32f3315 A

Where things get interesting is with merge commits because they have multiple parents. When you leave off <n> then git uses ^1, i.e. the first parent.

Using master’s remote branch as our rev, which is currently pointing to commit F, we can look at either of Fs parents, commit E or commit D:

> git --no-pager log -1 --oneline origin/master^
dd81b17 E
> git --no-pager log -1 --oneline origin/master^2
69fdef7 (dev) D

And of course rev can be a branch name, a commit hash, a tag etc.

^ can also be chained together, ^^^ means ^1^1^1 and in plain English it means “the first parent of the first parent of the first parent” but you could use any combination of numbers to really hone in on what you’re after, like master^3^^^2 which means “the second parent of the first parent of the first parent of the third parent of master”.

One final example before we move on, master^^2^ should show us commit C because ^^2^ means “the first parent of the second parent of the first parent”.

> git --no-pager log -1 --oneline master^^2^
30a4fb3 C

The first ^ selected the merge commit F, ^2 selected the second parent of F which was D, and the final ^ selected Ds parent C.

* 93ffd00 (HEAD -> master) G   # <- master
*   3b3d402 (origin/master) F  # <- master^
|\
| * 69fdef7 (dev) D            # <- master^^2
| * 30a4fb3 C                  # <- master^^2^
* | dd81b17 E
|/
* 6bfe3ba B
* 32f3315 A

And to really drive the point home master^^^ would show us commit B.

> git --no-pager log -1 --oneline master^^^
6bfe3ba B

<rev>~[<n>]

The sexy cousin of ^ is the tilde character ~, it is similar to ^ in that it allows you to specify ancestors but ~ only follows first parents. ~3 means exactly the same as ^^^.

master~3 will show us commit B:

> git --no-pager log -1 --oneline master~3
6bfe3ba B

We can mix and match ~ and ^ too, master^^2^ is the same as master~^2~.

> git --no-pager log -1 --oneline master~^2~
30a4fb3 C

<rev>^{type}

So ^ can help us find the ancestors of a revision, cool what else can it do? If you use curly braces after ^ and pass it a type like commit, tree, blob, then git will recursively dereference the revision until it finds an object of type otherwise it returns an error.

It’s good to keep in mind that this syntax doesn’t look at a parent of <rev> like the previous use cases, it looks at <rev> itself. master is currently pointing to commit G so master^{commit} should show us the commit that master is pointing at.

> git --no-pager show --oneline master^{commit}
93ffd00 (HEAD -> master) G
diff --git a/file b/file
index 8fda00d..eaf1c36 100644
--- a/file
+++ b/file
@@ -3,3 +3,4 @@ B
 C
 D
 E
+G

Commit objects make use of tree objects so this is handy way to see what tree object a commit is using. The tree object isn’t very interesting in this repo as we are only working with a single file, still it’s nice to have the option to look at it.

> git --no-pager show --oneline master^{tree}
tree master^{tree}

file

> git rev-parse master^{tree}
2ccb2a1893ea7f93abe3645d2511ecd83b555606

Objects are lovely and all but what if we want to find a commit based on what’s in the commit message?

<rev>^{/<text>}

This syntax allows you to search for the youngest commit that is reachable from <rev> that has <text> in the commit message. Very handy for tracking down when something was introduced to a code base, like a pesky bug. <text> is actually a regular expression so you don’t even need to know the exact text you’re looking for.

Unfortunately we don’t have any particularly exciting commit messages in this example repo but here’s a simple example looking for commit B:

> git --no-pager log -1 --oneline master^{/B}
6bfe3ba B

Bear in mind that had there been two commits with B in their commit message then master^{/B} would’ve returned the youngest commit, i.e. the most recent.

:/

This syntax is the bigger more powerful brother of <rev>^{/<text>}, :/<text> doesn’t need a revision specified because it checks all commits reachable from any ref.

To demonstrate I’ve added a few more commits H and I to the dev branch:

> git --no-pager log --oneline --graph --all
* 8c93571 (dev) I
* d70837d H
| * 93ffd00 (HEAD -> master) G
| *   3b3d402 (origin/master) F
| |\
| |/
|/|
* | 69fdef7 D
* | 30a4fb3 C
| * dd81b17 E
|/
* 6bfe3ba B
* 32f3315 A

Okay, this graph business is getting a little hard to follow so I’ll lay it out slightly differently which hopefully makes it easier to understand.

A--B-----E--F--G     <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

We still have master checked out, if we used the previous syntax to look for commit H then git won’t be able to find it because it’s not reachable from master.

> git --no-pager log -1 --oneline master^{/H}
fatal: ambiguous argument 'master^{/H}': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

This is where :/<text> comes in handy because it searches commits reachable from any ref.

> git --no-pager branch
  dev
* master

> git --no-pager log -1 --oneline :/H
d70837d H

There is a special sequence for this syntax too, namely :/!, it is reserved for modifiers. !- perform a negative match, so :/!-A finds the youngest commit that doesn’t contain A, which in our case would be the latest commit on dev, commit I.

> git --no-pager log -1 --oneline :/'!-A'
8c93571 (dev) I

Note that I had to wrap my <text> in quotes otherwise my shell (zsh) performs history expansion on the text before it gets to git.

To escape ! you use a double exclamation point :/!!.

That about does it for searching for text, let’s move on to specifying paths.

<rev>:<path>

With this syntax you can reference the blob or tree that <path> points to, remember tree objects are essentially directories and blobs are files. To demonstrate this syntax I’ll use the cat-file command which let’s us inspect objects.

I need to add a new directory and a new file to our repo in order to make explaining this syntax a little easier.

> tree
.
├── dir
│   └── other_file
└── file

1 directory, 2 files

And here is how the commit history is looking, commit J add the new directory and file:

A--B-----E--F--G--J  <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

Let’s see what type (-t) of object file is.

> git cat-file -t HEAD:file
blob

Git looks for file at the root of our repo, i.e. the top level. If we want to look at other_file we need to be more specific.

> git cat-file -t HEAD:dir/other_file
blob

The <path> will be converted to a relative path from the root of the project if it starts with ./ or ../. Let’s cd into dir and look at other_file.

> cd dir
> git cat-file -t HEAD:./other_file
blob

We could also look at the tree object that represents dir by using ./:

> basename $(pwd)
dir
> git cat-file -t HEAD:./
tree

And seeing as I mentioned ../ let’s use it to look at file which is in the root of our repo:

> git cat-file -t HEAD:../file
blob

I think I’ve thoroughly explained this syntax so let’s look at something a little more complicated that uses paths.

:<n>:<path>

This one requires knowledge of stages, I went over this in my previous post called git: ours and theirs. I’ll give a brief overview here.

Git keeps track of things using the index file, if we look inside the index file now using the ls-files command we see this:

> git ls-files --stage -v
H 100644 c8f145fff1f95011192cd044373759315c22c587 0     dir/other_file
H 100644 eaf1c36af3f83330127a0178e3a79a9198794cd9 0     file
                                                  ^

I added the ^ to the output and it is pointing to the stage number for the objects, the stage number can range from 0 to 3 and in this instance 0 just means they are normal files.

Stages 1 to 3 are reserved for merges and conflicts:

Alright, clear as mud. Let’s try and merge master and dev to get a merge conflict, both branches have been editing file so we will definitely get some conflicts.

As a reminder here is the current state of our commit history:

A--B-----E--F--G--J  <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

Here we go.

> git merge dev
Auto-merging file
CONFLICT (content): Merge conflict in file
Recorded preimage for 'file'
Automatic merge failed; fix conflicts and then commit the result.

> git status --short --branch
## master...origin/master [ahead 2]
UU file

Grand, now what’s in the index file?

> git ls-files --stage -v
H 100644 c8f145fff1f95011192cd044373759315c22c587 0     dir/other_file
M 100644 8422d40f12717e1ebd5cef2449f6c09d1f775969 1     file
M 100644 eaf1c36af3f83330127a0178e3a79a9198794cd9 2     file
M 100644 0a02bcfcceb224ee1733c9e21b8d053a9514c09d 3     file

There are now three versions of file in the index, now that we have some stages to work with we can finally turn our attention to the gitrevision syntax :<n>:<path>.

You may have already sussed how this works but the <n> here is the stage version you want to look at and <path> is the path to a file or directory you are interested in.

Let’s have a look at the base version of file:

> git --no-pager show :1:file
A
B
C
D

We can do the same with 2 and 3 to look at the different versions of the file.

> git --no-pager show :2:file
A
B
C
D
E
G

> git --no-pager show :3:file
A
B
C
D
H
I

The initial :<n> is optional and if it is left off git will look for stage 0 of <path> which right now doesn’t exist in the index so we get an error.

> git --no-pager show :file
fatal: Path 'file' is in the index, but not at stage 0.
Did you mean ':1:file'?

Let’s abort this merge and try running that command again, remember master is pointing to commit G.

> git merge --abort
> git --no-pager show :file
A
B
C
D
E
G

Seeing as there are no conflicts anymore then only normal stage 0 files exist in the index so :file shows us the blob for file.

reachable commits

I’ve been using git log throughout this post to look at commits but that command actually shows us sets of commits, this is why I’ve been using the -<n> option, i.e. -1, to tell it to only return one commit instead of the whole set.

Leaving off that option git log will show us the set reachable from the revision we specify. Reachable means any commit in the revisions ancestor chain including the commit itself.

Using one of our old friends I’ll show what I mean:

> git --no-pager log --oneline :/C
30a4fb3 C
6bfe3ba B
32f3315 A

The revisions I’ve shown so far will return all the commits reachable right back as far as git can go. In our small example repo that’s initial commit A.

excluding commits

It is possible to exclude commits reachable from a commit and we can do that by prefixing a revision with ^, for example let’s rerun the last command but exclude the commit set.

> git --no-pager log --oneline ^:/C
>

We get nothing back, on the surface that doesn’t seem very useful but if we give the command another revision like HEAD it’ll return the commits reachable by HEAD but not those reachable from C.

> git --no-pager log --oneline ^:/C HEAD
4b78cbd (HEAD -> master) J
93ffd00 G
3b3d402 (origin/master) F
dd81b17 E
69fdef7 D

Now that is actually useful! In fact it’s so handy there is an alternative syntax for it, this leads us nicely into the last section: ranges.

<rev1>..<rev2>

This syntax is the shorthand for ^<rev1> <rev2>. A double dot range will show us all the commits reach able by <rev2> but exclude any commits reachable by <rev1>.

> git --no-pager log --oneline :/C..HEAD
4b78cbd (HEAD -> master) J
93ffd00 G
3b3d402 (origin/master) F
dd81b17 E
69fdef7 D

I’ll use the commit diagram from earlier to give a different perspective on what’s happening.

*--*-----E--F--G--J  <- master (HEAD)
    \      /
      *--D-----*--*  <- dev
      ^
      └ commit C

:/C..HEAD has left us with any commits reachable by HEAD and excluding commit C and any commits reachable from C. HEAD is pointing at commit J so commits J, G, F, E, and D are returned.

Let’s do one more example.

> git --no-pager log --oneline :/E..:/I
8c93571 (dev) I
d70837d H
69fdef7 D
30a4fb3 C

Looking at our trusty commit diagram we get this:

         ┌ commit E
*--*-----*--*--*--*  <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

:/E..:/I has left us with commits reachable by I but excluded any reachable by E, which has given us I, H, D, and C.

I just want to point out while I’m using :/<text> as my revisions you can use any of the revisions we’ve spoken about. You can mix and match too:

> git --no-pager log --oneline 32f33..dev^
d70837d H
69fdef7 D
30a4fb3 C
6bfe3ba B

<rev1>…<rev2>

Instead of two dots we can use three dots. This very similar syntax will give us the symmetric difference between the two revisions, it’s slightly harder to wrap your head around but an example or two should clear things up. In essence it’s the commits reachable from either <rev1> or <rev2> but not both.

> git --no-pager log --oneline :/G...:/I
8c93571 (dev) I
d70837d H
93ffd00 G
3b3d402 (origin/master) F
dd81b17 E

The commit digram looks like this:

*--*-----E--F--G--*  <- master (HEAD)
    \      /
      *--*-----H--I  <- dev

Let’s take this one revision at a time, all the commits reachable from :/G are:

A--B-----E--F--G--*  <- master (HEAD)
    \      /
      C--D-----*--*  <- dev

And all commits reachable from :/I are:

A--B-----*--*--*--*  <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

Here are the commits reachable from both:

A--B-----*--*--*--*  <- master (HEAD)
    \      /
      C--D-----*--*  <- dev

And these are the commits reachable from either revision but not from both, AKA the result of :/G...:/I:

*--*-----E--F--G--*  <- master (HEAD)
    \      /
      *--*-----H--I  <- dev

With both the double and triple dot range notation you can leave either revision blank and it will default to HEAD. So :/B... means :/B...HEAD and ...:/B means HEAD...:/B.

misc set shorthands

To finish off there are a couple of useful shorthands for referencing a commit and its parents, you’d probably get the most use of these when working with merge commits.

I’ll repost the commit diagram again for reference.

A--B-----E--F--G--J  <- master (HEAD)
    \      /
      C--D-----H--I  <- dev

rev^@

This one selects all the parents of rev. Nice and simple! Here are all the parents of the merge commit F.

> git --no-pager show --oneline --name-only :/F^@
dd81b17 E
file
69fdef7 D
file

rev^!

This will exclude all of the parents of rev but return rev, on its own it represents rev and I’ll be honest I’m not sure how useful this one is but it’s good to know I guess.

> git --no-pager show --oneline --name-only :/F^!
3b3d402 (origin/master) F

file

rev^-<n>

This one includes rev, excludes the n-th parent, and returns all other parents of rev. Unlike the previous two it will return a set, i.e. commits reachable from those commits. You can think of it as working like rev^<n>..rev.

> git --no-pager log --oneline :/F^-1
3b3d402 (origin/master) F
69fdef7 D
30a4fb3 C

:/F^-1 returns all commits reachable from its second parents but not those reachable from the first.

*--*-----*--F--*--*  <- master (HEAD)
    \      /
      C--D-----*--*  <- dev

Naturally if we exclude the second parent we should only get back F and E:

> git --no-pager log --oneline :/F^-2
3b3d402 (origin/master) F
dd81b17 E

And for completeness here is the commit diagram:

*--*-----E--F--*--*  <- master (HEAD)
    \      /
      *--*-----*--*  <- dev

conclusion

Git certainly gives you enough tools to hone in on the exact commits you want to work with or to look at underlying tree and blob objects and I hope I’ve explained them well enough for you to understand them. If you’re ever stuck or need a refresher definitely check out man gitrevisions, nothing beats reading the fucking manual.

prev: git: ours and theirs | next: how SSL/TLS works @skipcloud