December 13, 2010

p4 annotate -I: going deeper

Branching

Since the "p4 annotate" command was introduced in the 2002.2 release, it has been a wonderful tool for determining when in a particular file's history a particular content change was added, which in turn makes it easy to look up the author of that change, the changelist description, and a wealth of other relevant information.

When a change was added to a file via some other file via integration, however, "annotate" was less useful, because it would tell you about the changelist that did the integrating, not the earlier changelist in the originating branch. The 2005.2 release added a "-i" flag that addressed this problem to some extent by functionally prepending the branch ancestor's revisions to the history (giving the effect of annotating one file with additional earlier revisions), but changes that were introduced by merging from different branches were still problematic.

As of the 2010.2 release, this problem should be solved by the new "-I" flag. I had fun working on this feature, and I thought our blog readers might have fun seeing how it all works.

The "annotate -I" command traces all of the integration history of a file to find a complete set of its integration "ancestors", which represent all of the possible original sources for any given line in the starting file. The normal "annotation" of a file already indicates which revision any given line originated in; if this revision was created by an integration operation, the "from" of that integration indicates a likely earlier source for lines introduced by that revision. Diffing the source and result of the integration allows these lines to be matched up, and the process can then be repeated on the earlier file to find yet earlier sources.

As a very simple (but not completely trivial) example, let's look at a set of three files (representing three branches) that have had a change propagated through them via a set of cherry-pick operations. Here is the basic "annotate" output for each file at the head revision:

    b1#3             b2#3             b3#2
1: ------        1: ------        1: ------
2: edit 1        1:               1:
1:               2: edit 2        1:
3: edit 3        3: edit 3        2: edit 3
1: ------        1: ------        1: ------

And here is a Revision Graph showing the integration relationships:

Revision Graph

In getting the "annotate -I" output for b3 we're going to need to look at each revision and follow the integration records back to look for earlier sources of the lines introduced in those revisions. The file only has two revisions, each of which was created by some sort of integration, so we'll need to look back at the sources for each of those.

Revision 1 of b3 was a branch from b2#1, so we'll compare the annotations of b2#1 and b3#1 to determine which lines correspond to each other. Since in a "branch" or "copy" operation the source is always identical to the target, this step is a bit trivial, but it tells us what we need to know:

    b2#1           b3#1
1: ------  ==  1: ------ 
1:         ==  1: 
1:         ==  1: 
1:         ==  1: 
1: ------  ==  1: ------
In turn, b2#1 was a branch of b1#1, so we quickly build a chain for each of these lines that goes back to the original source of the file:
    b1#1           b2#1           b3#1
1: ------  ==  1: ------  ==  1: ------ 
1:         ==  1:         ==  1: 
1:         ==  1:         ==  1: 
1:         ==  1:         ==  1: 
1: ------  ==  1: ------  ==  1: ------

Since b3#2 was a "merge", the source and target revisions are not identical, and we have to rely on our diff logic to match up as many lines as possible with b2#3.

    b2#3           b3#2
1: ------  ==  1: ------
1:         ==  1:
2: edit 2      1:
3: edit 3  ==  2: edit 3
1: ------  ==  1: ------

The diff tells us that the "edit 3" line in b3 is a match for a line in b2, and we can also see that this line was introduced as of b2#3, which is a source revision of the integration that added this line to b3#2. This is adequate to convince us that this line was propagated from b2 to b3 by means of that merge.

We can then continue on back to b1 to find an earlier source yet:

    b1#3           b2#3           b3#2
1: ------  ==  1: ------  ==  1: ------
2: edit 1      1:         ==  1:
1:             2: edit 2      1:
3: edit 3  ==  3: edit 3  ==  2: edit 3
1: ------  ==  1: ------  ==  1: ------

At this point we have a complete chain of ancestry for each line in b3#2. Note that the actual command line output just gives changelist numbers, much like "annotate -i", since the full depot paths wouldn't fit nicely on a terminal window line, but I've added shortened paths here for clarity. The tagged form (-Ztag) of "annotate -I" will give you the depot paths as well.

            b3#2
(b1#1) 19: ------
(b1#1) 19:
(b1#1) 19:
(b1#3) 31: edit 3
(b1#1) 19: ------

If you scroll back up to the Revision Graph, you'll see that one way of looking at what this is giving you is the origin point of the chain of arrows on the graph that propagates each line of text to the file of interest.

In terms of performance, the whole process is roughly equivalent to doing a normal "p4 annotate" on each file involved (the more branches you have, the more files are likely to be involved), which entails reading each revision from disk and diffing it with its predecessor, and doing an additional diff (in-memory) for each connecting integration record. The diffs are done after db locks are released, and diffing reasonable-sized text files is pretty quick in any case (2010.2 introduces a configurable max filesize limit for annotate that defaults to 10MB), so we're not anticipating that this new command will introduce any performance problems.