May 12, 2009

Perforce Anti-Patterns Part 2: Overuse of branching

Branching
What's New

Hi, A few weeks ago I posted an article on Perforce Anti-Patterns, citing the overuse of labels as an example. Here, I would like to highlight another anti-pattern, following a similar theme: overuse of branching.

This seems to be an odd topic, given that Perforce is famous for its efficient and powerful way of using branches, providing  inter-file branching and space- and time saving lazy copies. But herein lies the pitfall: too much of a good thing can still be a bad thing. Let me elaborate:

The use of branching in Perforce is cheap, but not free. For every file branched two entries are added to the database (in db.integed). After a while, these entries start to add up, and will hamper performance for certain operations, like "p4 integ".

I encountered an extreme example at one customer's site. The typical code line at this site contained about 45,000 files. For every bug fix and every programmer working on a bug fix, a tool automatically generated a new branch, integrating all 45,000 files. In the new branch, a handful of files would be modified and the changes integrated back to the parent branch. It is a credit to Perforce's architecture that this went well for several years before the customer noticed real performance problems, given that on average the customer created several branches a day. In the end, creating a branch that used to take several seconds in the past would take an hour.

In this situation, there are only two ways out: upgrade your hardware to accommodate the hugh db.integed database table, or obliterate all files in each branch that only have one revision (and are created via a branch, not an add). The obliterate option is in effect retrospective sparse branching. Depending on which version of Perforce is used and how many files are involved, obliterating all these revisions can take a long time - and is definitely not recommended practice.

So how can avoid the  overuse branching? Three ideas come to mind:

  1. Trust workspaces and your colleagues.
  2. Use sparse branching where applicable.
  3. Re-use branches.

In many cases, separate branches for every bugfix are not necessary. A fix can be created and verified in a workspace, and if submit code really fails, well ... that is what an SCM system is for: the change can always be rolled back.

If many people are working on the same branch and bug fixes need to be verified independently, sparse branching could be used. There is currently a slight procedural overhead involved for a sparse branch, since the branch needs to be created and the client workspace adjusted accordingly, but the time and space saved because of the smaller db.integed table can be enormous.

It is also possible to reuse branches, and have only one branch per developer or developer team. There are ways to ensure that the reused branch does not differ from the parent branch (hint: "p4 diff2" is your friend).

I will talk about how to use sparse branches in a separate blog soon.

To conclude: branching in Perforce is great, but it is possible to overwhelm it. There are techniques that will avoid creating too much metadata without skimping on functionality. Why don't you ask your favourite consultant around to show you how ? :-)

Happy hacking

Sven Erik