Convergence vs. Divergence: Purposeful Merging with Perforce


Table of Contents

What You're Reading
Overview
Branching...
Branches over time...
Branches diverge when...
Branches converge when...
Merging files
Integration history “arrows
Three-way file merging
The essence of a three-way merge
Three-way file merge tools vary...
Perforce's merge tools
What makes a good merge base?
The effect of base selection
Base selection through the ages
Arrow types and base selection
Preserving divergence
Inherited” divergence
Unintentional divergence
The effect of “edit” arrows
Guaranteeing convergence
Assuring a correct copy
In a nutshell...

What You're Reading

This is essentially the narrative that accompanies the slides presented at the European Perforce User Conference in September of 2006.

Note that the images shown here are just thumbnails to help you find your place in the slide show. You'll need to view the slides in Powerpoint to appreciate the animations and the details.

Overview

(Slide 2)

In this talk:

  • First, I'll explain what I mean by branching and merging, and I'll go over the essence of 3-way file merging.

  • That will give us context to look at convergence vs. divergence. We'll see that some branches converge, and some branches diverge, and that merging for convergence is not the same as merging for divergence.

  • And we'll finish by looking at how to use Perforce to get the convergence or divergence you want.

I assume you are somewhat familiar with certain aspects of SCM[1] and version control as applied to the software development lifecycle. Especially:

  • The mainline model

  • The "merge down, copy up" method

  • The difference between "soft" and "firm" branches (the "tofu scale").

Also, I assume you know something about:

  • Perforce's integrate and resolve commands

You'll enjoy these slides even if you aren't familiar with the preceding. To flesh out your understanding, you can read:

Branching...

(Slide 3)

So, what is branching? Branching is to make a branch. And a branch (a.k.a. a codeline or a stream) is a collection of files evolving together in the same phase of the development/release cycle.

In Perforce, as you know, a branch is created by cloning a collection of files. Once branched, files in each collection can evolve independently.

File content can be merged from branch to branch. In Perforce-ese, we call this integrating. E.g., Rel1 , which was branched from the mainline, can be merged back to the mainline. Thus, the changes made to Rel1 can be propagated to Main.

In fact, any branch can be merged into any branch. And they can be merged as often as needed.

Now, not all changes should be merged into all branches. For example, we wouldn't make a practice of merging all the Main changes into Rel1. (That would defeat the purpose of branching.)

So, over time, some branches tend to diverge. That is, their content gets more and more different.

Other branches may diverge for a while, and then converge. A development branch, for example, might be really different from the mainline for a while, and then when its development is done, it gets merged into the mainline and is then very much like the mainline.

Branches over time...

(Slide 4)

Let's try to visualize this with the miracle of PowerPoint... (I'm depending on animation here -- your printed handouts may not make this point.)

On the preceding slide, we saw the typical timeline diagram of branches.

On this slide, we're looking at a flowchart of branches, the same branches shown on the previous slide. It's as if we were looking at the timeline diagram "end on" -- we've rotated it 90 degrees on the Y axis:

  • Each arrow is pointing at our nose, and looks like a bubble from this view.

  • The branches are connected by lines that show the flow of change between branches.

Over time, the contents of a branch will either diverge or converge with its parent.

In the mainline model, for example, a release branch tends to diverge from its parent. In other words, the content of a release branch - which is relatively stable - becomes less and less like the content of its parent branch - the mainline in this diagram.

It's not the release branch that is changing, mind you - it's the release branch's parent that is changing. That's the cause of the divergence between a release branch and its parent. E.g., Release 1 was branched from main; over time, the content of the two branches becomes less and less alike We rarely merge from main to release 1. If and when we do, we merge very selectively so that we don't bring untested or inappropriate new development into the release.

By contrast, development branches tend to converge with their parents. Here, for example, we see a few development branches - they're branched from the mainline, and eventually their content is merged back into the mainline. (Again, we're using the mainline model to illustrate this.) While a dev branch may evolve in leaps and bounds, it eventually converges with main because:

  • First of all, mainline changes are constantly being merged to it

  • Second, when a dev task or project is complete, its content is delivered to the mainline.

The same thing applies to dev sub-branches - they eventually converge with their parents.

Now, merging for divergence is not the same as merging for convergence, it turns out. And we're going to look at that in a minute. But first let's compare the cases where branches diverge to the cases where they converge.

Branches diverge when...

(Slide 5)

Branches necessarily diverge when:

  • We're back-porting to release lines. Release lines, as we saw, always diverge from the trunk, because the trunk moves on while release lines are somewhat frozen in time.

    When we back-port, we preserve this divergence by merging specific changes, rather than all changes.

  • Cherry-picking is a way we preserve divergence. For example, when we do hunt-and-peck feature packaging, our packaging and distribution branch necessarily diverges from other branches -- its content is not the same as the mainline or any of its donor branches.

  • And the same thing happens with customization -- that is, when we're using per-customer or per-platform branches.

  • And as developers, we also preserve divergence when trading changes with one another, or when raiding other source code for boilerplates and code snippets. That is, we don't try to make our files look exactly like the files we're raiding. Instead, we take only the code snippets we want.

Finally, one obvious case for divergence that I forgot to call out out on this slide is that of:

  • Isolating new development from a stable mainline or trunk. A dev line diverges from its trunk as more and more new, risky, and untested changes are piled into it.

Branches converge when...

(Slide 6)

And when we want convergence is when:

  • Development tasks are completed. That is, when we're delivering or promoting completed work from a dev line to mainline, we first want to make sure the dev line is completely up to date with the mainline, and then we want to make sure everything in the dev line itself gets into the mainline.

    (This is the “merge down, copy up” strategy. If you're not familiar with the rationale behind it, you definitely should take a look at "How Software Evolves" in Practical Perforce.)

We see the same kind of convergence when:

  • Isolated development sub-projects are rejoined -- that is, when we're delivering private branch work into a shared development branch.

  • Distributed development is reconciled -- that is, when developers working in separate repositories update their repositories and merge in one another's changes.

Merging files

(Slide 7)

Now let's talk about merging. If you're familiar with Perforce, you know that the integrate command selects files to merge, and the resolve command does the actual merging.

integrate can take a pair of arguments that we refer to as the source and target branches.

But the integrate command isn't really operating on branches. Instead, it operates on individual source-target file pairs. In other words, merging between branches is really a matter of merging each file in a source branch into its counterpart in the target branch.

And when we submit the merged results, Perforce creates "integration history" between source-target file pairs.

For example, in Slide 3 (the section called “Branching...”), we saw a timeline of branches. If each of those branches had three files, an exploded view of the branches might look like what you see on this slide. (This graphic is produced by P4V's Revision Graph, by the way.)

We see that integration history connects individual files, not branches.

Integration history “arrows

(Slide 8)

Integration history is what makes Perforce smart about merging -- it's how Perforce keeps us from having to merge the same changes over and over again.

Another word for "integration history" is "merge arrows". This term comes from ClearCase, I think, but when you look at the revision graph, you can see why people call it that.

A "merge arrow" connects a pair of file revisions, and points from the source rev to the target rev. With Perforce you have more than just "merge" arrows -- you also have "branch arrows", "copy arrows", "edit arrows", and "ignore arrows".

We'll see more on these in later slides. For now, I'd just like to point out that Perforce uses these arrows to determine:

  • which changes, if any, still need integrating from source to target

  • which revision will be used as a base for a three-way merge from source to target

(And actually, it's incorrect to say that integration history connects a pair of file revisions. Perforce's integration history connects a range of source revisions with a single target revision. Each arrow represents changes in the source branch that are known to have been accounted for in the target branch, and the target branch change that accounts for them. You can see this for yourself in the output of commands that show integration history. It's also going to be visible, in a subtle way, in the 2006.2 P4V's Revision Graph.)

Three-way file merging

(Slide 9)

Now let's take a moment to review three-way file merging.

Perforce, like many other SCM systems, uses three-way file merging. To merge a file into another, you need three files:

  • One is the "source" - the version of the file on the branch you are merging from. (In this diagram, the upper line shows the history of the file we're mergin from. The source in this case is the head revision -- that is, revision 4.)

  • One is the "target" - the version of the file on the branch you are merging to. In Perforce, we typically merge into the head revision of the target file. (That was revision 3 of the target file -- the lower line -- in this example. But since then, revision 4 has become the head revision.)

  • And one is the "base", a file version selected by Perforce. (In this example, the base happens to be revision 3 of the source, the precursor to the source revision itself.)

Typically, we submit the merged results into the target branch, creating new revisions of the target files. (In the slide we're looking at, the result of the merge was checked in as revision 4 of the target.)

(Perforce commands and dialogs often refer to "theirs" and "yours" instead of "source" and "target', but I'll stick with "source" and "target" here to avoid confusion.)

As I mentioned, Perforce uses integration history to select the base:

  • The base is usually an earlier version of the file on either the source or the target branch.

  • Sometimes the base is not on either the source or target branches, but is instead on a branch that is an ancestor of both the source and target branches.

  • Sometimes the base on another branch entirely - it's a version that is on an indirect "path" from source to target.

What's important is the merge result. We expect the result to be a correct and usable file that contains elements from both the source and the target.

But that's not always what we get. As it turns out, the result of a 3-way merge operation is a consequence of:

  • the merge tool we're using, and...

  • the file that is chosen as the base.

The essence of a three-way merge

(Slide 10)

The essential logic of a three-way merge tool is simple:

  • Compare base, source, and target files

  • Identify the "chunks" in the source and target files file:

    • Chunks that don't match the base

    • Chunks that do match the base

  • Then, put together a merged result consisting of:

    • The chunks that match one another in all 3 files

    • The chunks that don't match the base in either the source or in the target but not in both

    • The chunks that don't match the base but that do match each other (i.e., they've been changed the same way in both the source and the target)

    • Placeholders for the chunks that conflict, to be resolved by the user.

Note that the "chunks" in this illustration are purely sumbolic. Each could represent lines in a file, or nodes in a hierarchy, or even files in a directory. It all depends on what a particular merge tool is capable of.

(You may be asking what advantage a 3-way merge offers over a 2-way merge. Actually, there is no such thing as a two-way merge, only tools that diff two files and allow you to "merge" by picking chunks from one file or the other. Only a 3-way merge gives you the ability to know whether or not a chunk is a change from the origin and whether or not changes conflict.)

Three-way file merge tools vary...

(Slide 11)

There are lots of 3-way merge tools. Their capabilities vary by:

  • File format: 

    Whether they handle text or binary files

    Whether they handle plain text or proprietary text formats (e.g., Word, PDF)

  • Granularity: 

    Whether they can distinguish and compare line by line, word by word, character by character, or by any other data format

  • Syntax awareness: 

    Whether they can parse XML, C++, Java, etc.

    Whether they can compare structures and elements

    Can they tell significant diffs from insignificant ones?

    Can they detect moved moved or duplicated elements?

    Can they detect global substitutions?

  • Conflict detection and automatic conflict resolution: 

    Whether they are "lax" or "diligent" about resolving conflicts.

    Lax algorithms leave more conflicts unresolved. More human intervention is needed to resolve conflicts.

    Diligent algorithms resolve more conflicts on their own. But without syntax-awareness they may not give acceptible results, and should only be used with post-merge testing and validation.

Perforce lets you use any merge tool you want, so if you know of a tool that uses an algorithm you like, you can substitute it for Perforce's merge tools.

(Note that you can't substitute your own merge tool for operations performed on the Perforce Server's machine. That means that you can't use your own merge tool for batch merging with p4 resolve -am. However, you can write your own program or script to do batch merges with the merge tool of your choice. See, for example, //guest/kyle_turner/perforce/resolve/ in the Peforce Public Depot.)

Perforce's merge tools

(Slide 12)

The Perforce-provided merge tools (p4 resolve, p4merge, p4winmerge) share the same underlying three-way merge code:

  • They merge text files only

  • They have no syntax awareness (except for whitespace and linefeeds)

  • They do line-by-line comparisons only

  • They are fairly diligent about resolving conflicts automatically.

    (Early releases of Perforce did less automatic conflict resolution, but users complained that they had to resolve too many conflicts manually. The current release, 2006.1, has a more diligent merge algorithm, and now users complain that too many conflicts are getting resolved automatically.)

Note that automatically merged results are not guaranteed to be correct. In other words, Perforce's merge tools do not validate syntax or logic.

(So why do users want a more diligent merge tool if the merged result can't be relied on? Because it's cheap to do auto-merging first and then check correctness with compilers and regression tests. With a lax merge tool, users have to manually resolve conflicts first before they can even begin to check correctness with compilers and regression tests)

Clearly these features of Peforce's merge tools - or any merge tool -- will have an impact on the results we get when we merge files. But given the same merge tool, what else impacts the merge result?

Base selection, that's what.

What makes a good merge base?

(Slide 13)

Given a source file and a target file with a long and varied integration history, which file revision makes a good merge base?

Here again is a revision graph showing arrows between files related by branching and merging. What you see here is the integration history of these files.

What if we now want to merge from Z into X? Let's say our source file is Z#7 and our target is X#9. Now, out of all the other file revisions shown here, which one (or ones) make a good base?

  • A good base is a revision with “qualifying history”. That is, every revision in its history is a contributor to both the source and the target.

  • A good base also has lots of history. That is, it's a file revision that is as close as possible to the both the source and the target.

The effect of base selection

(Slide 14)

Now, it's easy to tell from the merge arrows which candidates have qualifying history. But why is qualifying history to important? In other words, how does it affect the merge outcome?

Here is a simple case. We'll step through it and see what the merge result would be if each revision shown were used as the base.

In this case we're merging from the Y branch to the Z branch. The source is revision 5 on Y and the target is revision 2 on Z. [Animation: click to show the direction of merging, and to highlight the Y#5 and Z#2revisions.]

So far there's never been a direct integration between Y and Z. Both Y and Z, it's clear to see, were branched from X, so they have that in common. [Animation: click to highlight the X#5 revision.]

And in fact, the best choice for base seems to be X#5:

  • It qualifies because every revision in its history has contributed to both the source and the target.

  • And it's got more history than any of the other qualifying revisions.

Although file content has no bearing whatsoever on base selection, it's enlightening to look at examples of how base file content affects the merge outcome. So first, let's look at the file content at each revision. [Animation: click to make the sample content of each revision appear.]

The original edits to these files are visible in underlined green. The merges are visible in bold.

Next, let's consider: what is the merge result we expectWhat's in the source that needs to go into the target?

  • The source is different from the closest common ancestor in two of the three chunks: Where the base had A1, the source has A3. And where the base had C3, the source has C4.

  • You'll notice that the target is different from the base in one place only, and that is in the B4 chunk. Where the base had B3, the target has B4.

So what we'd expect, after merging the source into the target, is to get a file that has all three changes, A3, B4, and C4.

And if we apply the three-way merge algorithm to X#5, Y#5, and Z#2, we see that this is, in fact, exactly what we do get. [Animation: click to see the result of a merge using X#5 as the base.]

Note that Z#1's content is the same as X#5's. That's because Z was branched from X#5. So Z#1 is just as good a base as X#5. [Animation: click to see the result of a merge using Z#1 as the base.]

I mention this because sometimes Perforce picks a base that doesn't seem to qualify, but is in fact identical in content to one that does. (By the way, did you know that both p4 integrate and p4 resolve have a -o option that will show you the base that was selected?)

What would happen if we picked the oldest qualifying revision, X#1, as the base? We'd get more conflicts, simply because so much has changed in both the source and target since the base. [Animation: click to see the result of a merge using X#1 as the base.]

The same can be said for X#2, and its branched look-alike, Y#1. [Animation: click to see the results using either X#2 or Y#1 as the base.]

(Yes, this slide is getting cluttered! Confused about what you're seeing? Each dotted orange box shows the results of a hypothetical 3-way merge using the indicated revision as the base. In all cases, the source and target files are the same, Y#5 and Z#2.)

Okay, so what if we stepped outside of the qualifying zone and picked a really “young” file as the base, like X#6? No conflicts, but look -- the result is missing a change. The C4 change does not appear in the merged result.

What if we use the source's predecessor as the base? Same effect -- the C4 change is left out of the merged result. We also get a conflict between chunks A1, A2, and A3 because this is where a lot of change has happened on the source. [Animation: click to see the results using X#6 as the base.]

We can look at what the merged result would be using a few other revisions as base. [Animation: click to see the results using Y#4, Y#3, and Y#2 as bases.]

Notice that:

  • If we use the source itself as the base, the merged result will look exactly like the target. [Animation: click to see the results using Y#5 as the base.]

  • And if we use the target itself as the base, the merged result will look exactly like the source. [Animation: click to see the results using Z#2 as the base.]

And, we can take a look at how the merge would turn out of any of these other revisions were used as the base... [Animation: click to see the remaining hypothetical merge results.]

Now, whether or not a particular base produces conflicts is entirely circumstantial. It depends on what has changed in the files in question. There's no base-selection rule that can eliminate conflicts.

But what we can say is:

  • The farther a base is from the source and target, the more changed content you end up merging at once. This is likely to increase the number of conflicts you're going to have to resolve.

  • And the closer a base is to the source and target, the less changed content there is to merge. This is likely to reduce conflicts.

  • A non-qualifying base -- that is, a base that has history that hasn't already contributed to both source and target -- can leave some changes effectively unmerged. (As we saw when we tried used Y#4 as the base.)

Base selection through the ages

(Slide 15)

At the dawn of Perforce (i.e., when Perforce was first released, circa 1996), you simply couldn't merge changes between files that had no direct arrows between them already. In other words, you couldn't merge changes between files not directly related by branching.

Back then you could only "copy" or "ignore" between indirectly related files. If you'd tried to integrate from Y to Z here, for example, you'd have been given a choice of copying Y#5 to Z, or "ignoring" Y#5. (We'll get to "ignore" arrows in a moment.)

This wasn't as horrible as it sounds, because once you'd done that, you'd have created an arrow from Y to Z, and subsequent integrations in the same direction would use that to calculate a more effective base. So in the olden days, we'd do one integration to create a direct arrow, and from there on things would be okay for the most part.

  • The first improvement in base selection was to get rid of that complication. By Release 99.2 Perforce would let you merge between branches with no previous direct arrows by using #1 on the source line as the base.

    Here, for example, it would have used Y#1 as the base. This would have produced lots of diffs and conflicts, but again, it wasn't so horrible because subsequent merges could then use the direct arrows to calculate a better base.

  • The next improvement was in Release 2002.2 when Perforce could detect the source revs that had already been merged, directly or indirectly, to the target. So in Release 2002.2, Perforce would have pick Y#2 as the base for the merge we want to do here. This reduced the diffs and conflicts we tended to see the first time we merged between indirectly related branches.

    Also, at this point in its evolution, Perforce became able to pick a merge base on a branch other than the source branch. It could now pick a base on any of the branches that the source rev was descended from. So if we were trying to merge from Z to Y, for example, Perforce could pick a base on X. In some cases this really reduced the diffs and conflicts we'd see.

  • And the most recent improvement, in Release 2006.1, is that Perforce can now pick a base on any path between source and target. This is the ideal situation, because it means that the best base can be chosen. In the case we see here, for example, Perforce picks X#5 as the base for merging Y into Z.

Now, it's not that Rel 2006.1 always meets our expectation of picking the best base:

  • Sometimes it picks a less-than-ideal base as a trade-off between performance and computing the ideal base.

  • Sometimes the types of merge arrows connecting source and target branches confuse ideal base selection. That's what we'll be looking at in a moment.

  • And also, there are subtleties to picking a base on an indirect path that are yet to be determined by Perforce (i.e., by its server developers). In other words, the 6.1 base-selection algorithm is still "settling in" -- watch this space for news.

People often ask me "Why doesn't Perforce just use a closest-common ancestor (CCA) algorithm to pick a merge base?" Or, "Why doesn't Perforce just do what ClearCase does?"

The answer is that Perforce is supporting two kinds of merging, convergent and divergent, whereas other SCM systems support only convergent merging. Only Perforce allows you reconcile a pair of branches with one another while preserving divergence between them.

So, where an SCM system like Clearcase, for example, has one kind of merge arrow, Perforce's "arrows" come in several flavors. And where Clearcase's arrows connect pairs of source-to-target revisions, Perforce's arrows connect ranges of source revisions to target revisions.

These features allow Perforce to record the ways in which we want our branches to divergence, and to select merge bases that are most likely to preserve the divergence we've created.

Arrow types and base selection

(Slide 16)

On the previous slide we saw a simple example where all arrows were either "branch" or "merge" arrows. Let's mix it up a little and look at an example that shows a few more arrows. We've got the legend up here, too, so we can decode this diagram.

What we've got here is:

  • A "branch" arrow going from X#2 to Y#1. From this we know that Y#1 is identical to X#2.

    Also, notice the tail of the arrow: it tells us that Y#1 inherits the range of revisions up to and including X#2.

  • A "merge" arrow going from Y#2 to X#4. This tells us that Y#2 was auto-merged into X#4 - that is, no conflict resolution or editing took place.

  • A "merge with edit" arrow going from X#5 to Y#4. Meaning that Y#4 was edited by the user before or after X#5 was merged into it -- maybe to resolve a conflict, maybe for another reason.

  • A "branch with edit" arrow, going from X#5 to Z#1. This tells us that when Z#1 was branched from X#5, it was also edited by the user. So Z#1 is not identical to X#5, the file it was branched from.

  • A "copy" arrow going from Z#2 to X#6, meaning that X#6 is identical to Z#2.

  • And finally, there's an "ignore" arrow going from X#3 to Y#3. The ignore arrow tells us that the user chose to record the fact that the X#3 change should not be merged into Y, ever.

    There are other cases that can result in ignore arrows - we'll look at them in a moment.

What revision would make the best base?

So, given Perforce's variety of merge arrows, and the typically complicated histories of files in real-life software development, how easy is it to determine the best base? Not very. So I leave the finer points of base deterimination to our server developers.

But let's take a look at its effect. Say we merge from Y to Z here. Our source revision would be Y#5 and our target would be Z#2. Perforce will pick Y#3 as the base.

Y#3 qualifies because all the changes that have gone into it have also made their way into both the source and the target. And of the revisions that qualify, Y#3 has the most history.

But is it a good choice for a base? Well, let's throw some example file content up here to look at. What are we actually trying to merge from Y to Z?

As we see, the only change in Y that is not already in Z is A2. And in fact, using Y#3 as the merge base yields a result that contains A2. (Unfortunately the merge hits a conflict on B3, B4, and B6, so we'd have to resolve that manually.)

There's lots more to say about whether and why Y#3 is the right base, so maybe we can go into this in the Q and A afterward. But for now I want to get back to talking about "ignore" arrows.

Preserving divergence

(Slide 17)

"Ignore" arrows are the key to preserving divergence between branches, if you use them the right way. They are created when we "ignore-integrate" changes from one branch into another.

What is "ignore-integrating"? It is what you're doing when you run p4 integrate followed by p4 resolve -ay. The "ay" means "accept yours" -- i.e., keep the content of the target file ("yours") as is, without merging content from the source file ("theirs").

Nothing is merged from source to target when you do this. However, Perforce does record the fact that you have taken the source's changes into account, and decided to ignore them, as far as the target goes. In other words, it records the intention to diverge.

"Ignore" arrows allow you to keep on merging between a pair of branches that are diverging intentionally.

Continuous, incremental merging preserves divergence

For example, say you've branched a development line (B) from the mainline (A). As dev line B evolves, it diverges further and further from the mainline. Meanwhile, the mainline is changing, too:

  • bug fixes are coming into it from release lines

  • completed development is coming into it from other dev lines

Each mainline change has to be integrated to the dev line. (See "The Flow of Change" for the why's and wherefore's of that requirement.) But not all mainline changes can be applied to the dev line - some will not apply because the dev line has changed so much.

The A changes that are inapplicable to B are "ignore-integrated" from A to B. By ignore-integrating the inapplicable changes we build integration history that:

  • will be used for subsequent base selection

  • prevents inapplicable changes from being revisited the next time we merge from A to B

For example, notice that all of A's changes except for #7 have already been integrated to B. And both A#4 and A#6 were ignore-integrated into B -- that is, they were intentionally not merged into B.

For the next A-to-B merge now, Perforce will pick A#6 as the base. This allows the A#7 change to be merged in without bringing along the A#6 change.

Ignore arrows serve their purpose well when you're merging changes continually, in order, from one branch to another.

After-the-fact, incremental merging preserves divergence

They also work well with after-the-fact merging, as long as you merge changes incrementally. Here, for example, D was branched from C and changed a few times. Then C's changes were integrated incrementally, and both C#4 and C#6 were ignore-integrated. Now when we merge from C to D, Perforce picks the base that preserves the divergence already established by previous integration.

Cherry-picking before mass-merging does not preserve divergence

But there's a gotcha here. This doesn't work very well - that is, it doesn't do what you expect - if you're merging changes en masse and you've got ignored changes wedged in between other as-yet unmerged changes.

Here we've got E branched to F, just like C and D and A and B above them. None of E's changes have been merged to F yet. But we did go ahead and ignore-integrate the changes we don't want to pull down to F, E#4 and E#6.

Now we want to merge changes E#3, E#5, and E#7 to F. Unfortunately, if we attempt to do that in one fell swoop, Perforce picks a base that causes all of E's changes to appear in the merged result, despite the ignore arrows. That's because it can pick only one base, and, for better or for worse, it picks the base that assures all as-yet unintegrated changes will be merged.

Inherited” divergence

(Slide 18)

Divergence, whether intentional or not, can also be "inherited" from branch to branch.

Take a look at this example:

  • B was branched from A (to do some development work, let's say)

  • And C was branched from B (to isolate a risky aspect of the development going on in B, say)

At some point, the owner of the C branch decided to merge down recent bug fixes from the A branch. But there was only one unintegrated change to pull down, and that change turned out to be inapplicable to the C branch. So that change was ignore-integrated from A to C. This creates the ignore arrow we see from A#4 to C#3.

Later, the work in C is completed and promoted into the B branch. This creates the copy arrow we see from C#5 to B#5.

Now the owner of the B branch is going to merge down bug fixes from A. At this point, two changes have been made in A since B was branched from it, A#4, and A#5. The source revision for this merge will be A#5 and the target will be B#5.

Thanks to the ignore arrow from A to C, the base will be A#4. As a consequence, the result of this merge will contain the A#5 change but not the A#4 change. Perforce assumes that since all of C was merge to B, and C intentionally diverged from A at A#4, B should remain divergent from A in the same respect.

Unintentional divergence

(Slide 19)

A side-effect of ignore arrows is that they can cause unintentional divergence. For one thing, ignore-integrating a change is not same as backing out a change.

Not the same as backing out

For example, here A#5 was ignore-integrated to B. The result was checked in as B#2; B#2 is thus identical to B#1. Then B was edited, resulting in B#3. Merging B back to A now won't make the A branch file look like B.

And why would you expect it to? Because you thought that you could merge that "ignore" from B back into A - in other words, you're thinking B#2 is an "undo" of A#5, and that this "undo" can now be merged from B to A. Thus you expect the new A#6 to come out looking just like B#3.

But what happens is that Perforce preserves the divergence you sanctioned by ignoring A#5. It picks a base that will cause the new A to contain both the B#3 change and the A#5 change. That's great if the divergence was intentional, but not if you thought you were backing out the A#5 change by ignoring it. Hopefully your compiler will catch it now.

Sometimes this can happen when you don't even realize it. Let's say that when you integrated A into B, you manually resolved the file. You looked at the merged result in the editor, made a change to correct what looked like a merge error, and saved the file. But what you didn't realize was that your editing made the result file look exactly like the target file. So Perforce recorded an ignore arrow when you submitted B#2.

Merge down, copy up” can be foiled by ignore arrows

And there's an even more pathological case. Here C#5 was ignore-integrated into D, either because you thought that you could back out the C#5 change by merging D back to C, or because you unwittingly created an ignored change when you resolved and edited the file manually.

Of course, this particular D file is just one of many files you've been working on in the D branch. So let's say your development work in the D branch is now done. You've done the due diligence and merged everything down from C, your tests run beautifully, and you're ready to copy your D files back up to the C branch.

In Perforce, as you know, you use the integrate command to copy as well as to branch and to merge. Unfortunately, when the integrate command looks at this particular file, it will report that there's "nothing to integrate". The only change you made to it since you branched it from C was, as far as Perforce knows, an intentional point of divergence.

The effect of this, of course, is that when you think you're "copying up", you may not be. In a moment we'll see how to assure that you're really copying the source to the target.

The effect of “edit” arrows

(Slide 20)

But first, let's take a look at "edit" arrows and their effect. There are two ways to create "edit" arrows:

  • As you resolve a file, reply to resolve's prompt with "ae" ("accept edited"). (You must resolve the file interactively to do this.)

  • Auto-resolve the file and then "re-open" it with p4 edit before submitting it.

An edit arrow points to a revision whose content is assumed to have been edited before or after it was integrated and resolved. Here, B#2 is such a revision. If we were to merge B back to A now, B#2 would considered an original change, above and beyond any change that may have been merged -- or ignored -- from A#5.

However, "edit" arrows don't guarantee convergence. They also carry with them the record of how a file was resolved. So, even though B#2 will be a candidate for merging back to A, the base for the merge will depend on how B#2 was resolved.

  • The base could be B#1, which would in fact preserve the divergence between the two branches...

  • ...or it could be A#5, which would cause the two branches to converge. (And a tight convergence it would be: B#3 would be copied into A.)

For the pathological case shown on the previous slide, involving branches C and D, this at least assures that the D branch will have something that needs merging back to C. But we can't guarantee that it will be a convergent merge. (In other words, we can't guarentee that merging D back to C will effectively back out the C#5 change.)

But look -- convergence, in these cases, could be achieved by simply copying the source to the target. We could make A and B converge by copying source B to target A. And we could make C and D converge by copying source D to target C.

We can do that because all the target changes are already in the source. That's the beauty of the "merge down, copy up" method.

Guaranteeing convergence

(Slide 21)

So as it turns out, if you're sticking with the "merge down, copy up" method, Perforce is effectively doing the right thing. Perforce is assuming that:

  • If you are merging, you want to preserve divergence (per "ignore" arrows)

  • If you want complete convergence, you'll copy instead of merge

(Actually, Perforce isn't assuming anything at all; it isn't trying to second-guess you. But it does behave as if it had "merge down, copy up" in mind.)

Here we see the integration history of pair of files in a pair of branches where "merge down, copy up" is being used. Changes are continually merged down into the softer branch, and at points of completion, the softer branch is copied up to the firmer branch.

However, as we saw a couple of slides ago, "ignore" arrows can foil your attempt to copy files if you're not careful. There's a trick to assuring a correct copy.

Assuring a correct copy

(Slide 22)

A correct copy is an integrate/resolve operation that produces a target branch whose files are identical to their counterparts in the source branch. The goal is to create "copy" arrows between source and target file revisions, without creating gratuitous revisions of files that are already identical, and without skipping over files connected only by "ignore" arrows.

So here's the recipe. You'll probably want to do this in a pristine client workspace (i.e., a workspace that has no opened files):

  1. Run integ -n in the opposite direction first, to make sure there aren't any target branch changes not yet accounted for in the source branch.

  2. Run integ -f to cause all target files to be opened for integrate regardless of their integration history.

    (You may have to throw in the -d flag to handle existential changes.)

  3. Run resolve -at to copy source file content to target files.

  4. Now, run diff -sr to identify the opened files whose content hasn't actually changed, and revert them.

  5. Run integ again (without -f) to open files that were reverted in the last step even though they have source changes not accounted for in the target.

  6. Run resolve -at so these files can be submitted.

  7. And finally, submit the opened files.

Be aware that for extremely large branches, this recipe can dim the lights. If performance of more of a constraint than is the integrity of the target branch (i.e., if the target branch can handle brief periods of instability), you can:

  • Use this recipe on subsets of the branch, rather than operating on the entire branch at once.

  • Do a normal "copy-integrate" first, then use p4 diff2 (or P4V's Folder Diff) to look for non-matching files that were skipped because of ignore arrows. The outliers can then be integrated with p4 integ -f and resolve -at to make the branches identical.

In a nutshell...

(Slide 23)

So, what have we learned?

  • We've learned that not all branches are alike. Some diverge, some converge, and Perforce is designed to support both.

  • We've learned that Perforce uses three-way merging to merge files, and that the base used for a merge operation is what determines whether we'll get convergent or divergent results.

  • We've learned that Perforce's ability to pick a base has been improving over the years, both for convergent as well as diverging merging. However, due to the effect of "ignore" arrows, it's still slightly biased toward preserving divergence.

  • And we've learned that if you want control over when and how branches converge, the "merge down, copy up" method works nicely with Perforce -- as long as you're really copying when you think you are.



[1] For the uninitiated, SCM means “software configuration management”.