July 23, 2014

Working with Git Commit IDs

Git at Scale

git fusion icon


Git commit IDs are often confusing to those new to Git. When committing changes, Git responds with the name of the current branch, a string of numbers and letters, and the commit message. The string between the recognizable branch name and message is the shorthand version of the Git commit ID. Consider the following example:

[master f3abe64] Added a new readme file to illustrate commit IDs.
1 file changed, 1 insertion(+)
create mode 100644 myreadme

The shorthand commit ID is f3abe64, and it’s shorthand because the real thing is actually forty hexadecimal characters that specify a 160-bit SHA-1 hash, which uniquely represents the new, post-commit state of the repository. The Git log command shows the entire value:

commit f3abe64fc121b75f3f0566c73f2f1a4e8fffd68e
Author: jwilliston@perforce.com
Date:   Thu Jul 3 16:12:01 2014 -0700

Git differs in its commit IDs from most other version control systems (VCS), which tend to number things with monotonically increasing integers.1 It’s the result of an implementation decision that provides no small fraction of Git’s value, but it also provides some headaches.

Pain Points

One of those pain points, for example, is the obvious disconnect to the human eye and mind. When faced with a list of numbers like 1, 2, 3, I don’t have any trouble putting them in increasing order. But toss out a bunch of forty-character commit IDs and my eyes simply glaze over. Still worse can be the extra effort of integrating Git with other development tools that expect revisions, builds, etc. to be stamped with integers.

Thankfully, Git provides two mechanisms that mitigate said pain, first of which is that commands can accept the shorthand version of a full commit ID. In small repositories this can address the problem nicely, but in large repositories the Git-specific variant of the “Birthday Problem” 2 will eventually raise its head and require users to supply more. Those worried about the problem can configure Git to provide their preferred number of characters using the config command.3 For example, the following command will configure Git to use twelve characters:

git config --global core.abbrev 12

The second mechanism is tags, which most VCS call labels. Git makes it possible to apply semantic names (e.g., “v1.0”) to any point in repo history. Tags are a bit more complicated in Git than other systems, including features such as annotations, PGP key signing, etc., but they otherwise work as one might expect.

A tag uniquely identifies a point in history, which may be used with Git commands in place of the corresponding long or shorthand commit ID. For example, the following command will add an annotated, unsigned tag with a message to the aforementioned commit:

git tag -m “My Very First Release Tag” v1.0 f3abe64

Pain Relief

Of course neither of these mechanisms makes Git commit IDs as simple as more familiar integers, and this can be particularly irritating when trying to integrate Git with other tools that expect simple numbers. Happily, using Git with the Perforce platform can solve this issue.

Git Fusion processes each Git push by submitting the changes to the back-end Perforce server, which will assign a simple changelist number to the resulting work. The changelist description also includes the full Git commit ID, which makes it possible to tie any Perforce changelist in the back end to the Git commit ID from which it was pushed. The result is that developers can use Git as they like, while development operations and others can easily tie builds, bugs, and the like back to simple integers.

For example, after making a one-line change to the file previously added to the project, a simple push to the Git server subsequently submits to the back-end Perforce repository as well. And the Perforce description includes quite a bit of data for sake of future reference:

Imported from Git
Author: jwilliston@perforce.com 1405380564 -0700
Committer: jwilliston@perforce.com 1405380564 -0700
sha1: b201b948d7cd8aa2efa877725999bc9c1c0806b2
push-state: complete
parent-changes: f3abe64fc121b75f3f0566c73f2f1a4e8fffd68e =[2]

This indicates the commit originated from Git and provides the full commit ID for the changes as well as the parent commit. And while Git users do not need worry about such details, Perforce users can continue to work with all the files with simple revision and changelist numbers, which addresses the aforementioned pain points quite nicely. See for yourself what it's like to enjoy the best of both worlds -- give Git Fusion a try today.


1 For those who don’t remember math, integers are numbers such as 1, 2, 3, and so forth.

2 In the general sense, the birthday problem is that some people will likely have the same birthday in any randomly generated sampling. The Git version of the problem is that eventually enough commits will be added to make “shorthand” commit IDs ambiguous. Those interested in the general details should consult the Wikipedia article on the subject.

3 Specifically, the core.abbrev setting controls the number of characters used.