Driving up quality with automated review
I've recently been involved in refactoring an application to alter it from its successful, 'tactical' implementation to a more 'strategic' codebase. Broadly speaking this means improving the quality of the code.
This 'quality' requirement was obviously key and, like many non-functional requirements, initially defined a bit too vaguely. We expanded it into a more tangible goals - increase developer productivity, reduce UAT overhead, etc.. It was very tempting (and I've certainly done it in the past) to simply define the actions that would meet these goals based on personal bug-bears in the code. Instead we opted to identify the metrics that represented these NFRs and commit to improving those. This would help identify where we should spend time and, crucially, provide a measure that demonstrated that we were being effective.
The following metrics, some of which I'll cover in a bit more detail later, were applied:
- total lines of code
- non-commenting source statements (NCSS)
- cyclic complexity number (CCN)
- lines of duplicate code
- unit test coverage
- coding standards violations
Some were effective, some not. Overall, though, I've been pleasantly surprised by how easy they were to work into the automated build and the insight even a simple set of metrics can provide. They've also proven to be a very useful way of communicating progress to (non-technical) stakeholders in an empirical way. I don't need to waflle on about code changes anymore - the line on the graph drops over time, the line represents code, code represents cost of ownership.
Total lines of code
The metric that's often overlooked as being too simplistic and misleading? The actual value of the line count metric doesn't really express cost of ownership very well but I've found it useful nonetheless. The shape of the total line count graph doesn't really change much when comments/whitespace are removed so it demonstrates the overall trend effectively. The reason I still use it alongside NCSS is that absolutely everyone understands it!
Non-Commenting Source Statements (NCSS)
As its name implies, NCSS represents anything that's not commentary. You could view it as the weight of the compiled code, or the amount of source code that actually does stuff, etc..
I must admit that I've not found it much more insightful than counting lines of code although it has had its moments. Noone outside the development team seems to care about it very much when a simple line count metric is available.
Cyclic Complexity Number (CCN)
In essence, CCN measures how many paths there are through a particular piece of code. Conditions and loops add to the CCN of a method. This can be viewed as indicating the number of test cases that are required to cover the method.
I'm not sold on an argument as simplistic as reducing average method CCN equates to a reduction in TCO. After all, TCO is about the total cost of ownership, not the average cost of ownership. High complexity is clearly something to be avoided, but not necessarily because it will result in less code or less tests!
CCN has definitely been worthwhile graphing over time. It does a very effective job of pointing out where the code has turned to spaghetti or has multiple responsibilities. It's a metric that's appealing to developers as there is a clear correlation between the numbers and how much they like the code. It was strangely fun to pick one of the top ten methods (by CCN) and try to get it out of the top ten.
Duplicate lines of code
I use a tool called CPD (the Copy-Paste Detector) to identify where duplication occurs in the code. It's not particularly sophisticated as it is not very aware of the language and therefore is mainly doing a textual comparison. Despite this limitation, it identified about a fifth of my codebase that had been copied verbatim from somewhere else.
Since the remedy to most copy/paste operations is to delete one of them there is often a very quick improvement in this metric when you run through, finding references and deleting. I found it to be dramatic and short-lived, however, as once the duplicate code had been identified and removed there was apparently nothing more to do.
CPD missed quite a lot of duplication since it was fooled by differences in commentary and string literals. Fortuantely, in many cases, we'd already been tipped off by duplicate neighbouring code. Unfortunately, removing this duplication didn't alter the metric and perhaps therefore gave the impression of being a waste of time. Fortunately, *phew*, removal of code was always reflected in line count and NCSS metrics.
I've found code metrics to be effective, especially when you pick a very small set of demonstrably worthwhile ones. Duplication is sometimes a quick win but CCN is my current favourite, especially if you can explain to your audience without them glazing over.
I'd be interested to hear if anyone has any particular favourites.