April 30, 2008

Post Hoc, Ergo Propter Hoc

Surround SCM
We’re investigating adding an annotate/blame feature in Surround SCM. If your not familiar with this concept, it allows you to mark every line in a file with the date, version and person who last changed it. For example, if the current version of your file looks something like this
int bar;
if (bar>7)
    printf(“Big bar”);
The after running annotate, it might look like
Version     User        Date
25          gatesw     1/12/2008   int bar;
23          gatesw     2/25/2007   bar=7;
25          gatesw     1/12/2008   bar++;
8           ballmers   1/11/2006   if (bar>7)
7           gatesw     5/15/2005        printf(“Big bar”);
In looking at this feature, I discovered there was some “unease” about it, specifically around the name. It centers around that blame word. No one likes to be blamed for something. No one wants to play the “blame game”. “Let’s not finger point, let’s solve problems!” Blech. I think I threw up in my month a little just writing that. Early on it seems to be generally called blame, but later systems started using the name annotate. Much nicer. While the actual name might not be critical, how people use it is and I think the "blame" word makes more sense. I’ve used functionality like this (or simulated it) for a while, and I have to say it was most often for  blaming someone. I saw some code, and I wanted to find out who was responsible for it. Often it was to have a reasonable discussion. “Can you help me understand what this code does.” “We’re not using that approach any more, can you change it.” Certainly there were occasions when it ran more along the lines of “Who’s responsible for this mess?” or “Who dared to touch the perfection that is my code?”. Though, as the title of this post reminds, just because someone changed something it doesn't mean the problem you've run into is because of that change. You can certainly use this kind of feature for code reviews and related activities to understand the history of file. But I believe that the primary use case is when you see some changes that you question and want to track down the responsible party. I don’t think there is any issue with that, and thinking of this feature in that way may drive different functionality than assuming it’s most often used for code reviews and such. For example, if the primary use is for code reviews, then you might want to really make sure you have excellent printing and exporting capabilities. On the other hand, if blame is your game, then you want to make sure it’s very interactive. You might want to easily see what other changes went in with that version. What other files were changed by that user at the same time. I guess we could call it "responsibility", but that doesn't exactly role off the tongue either. The fundamental question still remains, though. What is the primary use for functionality which allows you to identify the person who last modified a set of lines in a file? I’d be interested in peoples feedback on how they might use this. And if you don’t like how we implement it, please don’t blame me.