September 19, 2013

Measuring Code Stability – Part I

Version Control

I've been thinking a lot recently about how we know when software development is going well and was reminded of a sketch from one of my favourite comedy shows: ‘Definite Article’ by Eddie Izzard (if you haven’t seen it, you really should). There’s a part of the show where he talks about shopping for fruit, and how we all squeeze the fruit to check if it’s ripe; Eddie asks how we know how to do this? “Is that good? I’m squeezing about this much - is that good squeezy?” (It’s about 4m 45s into this YouTube video)

Knowing whether or not your software development project is going well can be very similar, and people have come up with many different ways to assess the maturity/readiness/quality/etc. of products and projects. The step upon which all of them stumble is context. That’s correct in my view, because no quantitative measure of a project can possibly account for all of the context surrounding the development.

But does it therefore follow that all metrics are useless? Not at all. To my mind, metrics offer us one key thing: insights into the questions we should be asking. The path to understanding looks something like this:

Data + Context = Questions + Answers = Understanding!

As in so many areas of life, with the right questions, come the right answers. So, what we need is not metrics that claim to have all the answers, but metrics that prompt the right questions.

As a Version Management product, Perforce is well placed to be able to provide metrics that give rise to some very interesting questions, and as many of our customers have been using Perforce for a number of years, there is a well of information there just waiting to be tapped. This information is not just about what changed and when; if you’ve been using jobs to track your bug reports (either directly, or through one of our integration partners), then Perforce also knows what was fixed and when, and those are very powerful data points.

When I look at the metrics for a project, it’s not often the case that there’s one metric that tells me all I need to know, but as a manager, that’s really what I want: the Red/Amber/Green status for my projects so that I know when to start looking deeper and asking the difficult questions. In my experience, many of those questions are around bugs and the level of change still being made to a project – is it ready to ship yet?

So could we combine data about the defects in a project with data about the changes being made to that project to get a view of how ‘stable’ the code is? (Note there I said code, not product!). I think we can, and here I’ll share our ideas about how we might go about it.

Stability Index

We’re notionally calling this idea the ‘Stability Index’ and essentially it’s a combination of three factors:

  1. The weighted defect raise and fix rates
  2. The proportion of the code-base that is changing
  3. The proportion of the code-base that is changing repeatedly

What does it measure?

The Stability Index is an aggregate measure of the rate at which your code is changing, combined with a measure of how well it is coping with that rate of change.

It may therefore offer some indication as to the level of risk associated with the project.

It can also be interpreted as a measure of the maturity of the codeline: new projects will typically have very low stability indexes while the value for older projects will usually be higher.

It’s primarily intended to be calculated for projects on a daily basis over time as it’s the trend for a project that is likely to be more significant than the value. The trend line may tell you whether or not a project is approaching readiness to ship.

Comparison between projects is likely to be heavily context dependent should only be performed with that context in mind.

What does it NOT measure?

In no way does the Stability Index measure the quality of the product or code. As bugs are factored into the calculation, one can argue that there is a measure of the stability of the product in there, but it is perfectly possible for a rapidly evolving product to have a low Stability Index, yet be robust and fully functional. Context is everything.

Conclusion

By using this analysis on our code, the Stability Index is generating interesting insights for different projects at different levels of maturity, and we’re starting to ask some different questions as a result.