August 24, 2009

Mining Perforce for Refactoring Candidates

Integration

Perforce provides a lot of information about our software development processes, but recently I began wondering if we are asking all the questions of it we could. For example, can it help us figure out which files are candidates for refactoring?

Truth be told this tool started with a visualization idea and ended with figuring out what it could be used for. I recently read Stephen Few’s new book Now you see it which covers the usage of a number of data visualization tools. One in particular, the stem and leaf plot, caught my fancy and I was determined to find a use for it.

A stem and leaf plot is a glorified text-based bar chart that is commonly used by statisticians for quick and dirty distribution analysis. I admire the fact it can show both distribution and detailed information in one chart. The version I ended up using for my tool doesn’t strictly match up with the classic stem and leaf plot definition, but I think it keeps the soul of it.

Now on to the tool! The question I wanted to answer was can Perforce show me files that are making people resolve frequently, my assumption being that files that multiple people change frequently may be overloaded and candidates for splitting up. I decided that any file that has multiple submissions by different users in a given day would be a candidate.

Here's an example of the tool showing four weeks of development on a project. Each number in the output represents the number of unique users who submitted changes to one file.

ruby overlaps.rb - w 4

2009/06/23 22222222222222
2009/06/24 2222
2009/06/25 2222222
2009/06/26 4222222
2009/06/27
2009/06/28
2009/06/29 22222
2009/06/30 2
2009/07/01 222222222
2009/07/02 22
2009/07/03
2009/07/04
2009/07/05
2009/07/06 22
2009/07/07 222222222
2009/07/08 222222222
2009/07/09 3222
2009/07/10 2
2009/07/11
2009/07/12
2009/07/13 322
2009/07/14 32
2009/07/15 222
2009/07/16 2222
2009/07/17 3222222
2009/07/18
2009/07/19
2009/07/20 4222
2009/07/21 3

Looking at the output for 2009/07/17 we can see that one file was changed by 3 separate users while another 6 were hit by 2 different users.

In general above we can see a few small overlaps per day with an occasional "spike" to 3 or 4. Let's look at the day with four overlaps in more detail using the ‘-v’ flag to view the list of files.

ruby overlaps -d 2009/06/26 -v

2009/06/26 4222222
2 //depot/main/doc/guinotes.txt
2 //depot/main/doc/apinotes.txt
2 //depot/main/src/msgs/help.cc
2 //depot/main/src/net/tcpip.cc
2 //depot/rel2.0/doc/xmlnotes.txt
4 //depot/rel2.0/doc/guinotes.txt
2 //depot/rel2.0/doc/usage.txt

Most of the overlaps were thankfully just release notes. Maybe a release is coming up soon? Let's take a look at our src path for the last week to see how many overlaps are in our source code.

ruby overlaps -v //depot/main/src/...

2009/07/15 222
2 //depot/main/src/lib/rlib.cc
2 //depot/main/src/net/tcpip.cc
2 //depot/main/src/net/udp.cc
2009/07/16 2
2 //depot/main/src/net/tcpip.cc
2009/07/17 3222222
2 //depot/main/src/conf/local.cc
2 //depot/main/src/conf/remote.cc
2 //depot/main/src/lang/eng.cc
2 //depot/main/src/lang/frn.cc
2 //depot/main/src/lang/ger.cc
2 //depot/main/src/lang/rus.cc
3 //depot/main/src/net/tcpip.cc
2 //depot/main/src/msgs/help.cc
2009/07/18
2009/07/19
2009/07/20 4222
2 //depot/main/src/db/tables.cc
2 //depot/main/src/msgs/usage.cc
2 //depot/main/src/msgs/help.cc
4 //depot/main/src/net/tcpip.cc
2009/07/21 3
3 //depot/main/src/net/tcpip.cc

tcpip.cc appears to be a popular file. It may be worth looking into why so many people have been touching it recently. There may be no problem at all, or the file could be the source of a lot of developer headaches.

What did I learn from running the tool on our source here at the Perfortress? It confirmed that the files that I thought were hot spots really are, but I also learned that, by this tool’s metric, our current development process is successfully minimizing developer collisions. It would have been more exciting in some ways to find a hidden trouble spot, but I guess no news is good news!

The tool is available in the public depot as a Ruby script. It uses P4Ruby, but on the off chance you don't have P4Ruby installed already, there are builds for a few platforms in the P4Ruby directory along with the script. I’d love to hear from anyone about their experiences using it.