Testing Foundations
|
There are many kinds of code coverage. Some of the types have more than one common name, plus several uncommon ones. Discussions of coverage can get hung up on name debates, with endless claims about what the right name is and what, precisely, different names mean.
As someone who's written four coverage tools, those discussions pain me. They make the whole topic seem over-complicated. To the ordinary person, it's really simple. There are four relevant types of coverage. They're all easy to explain. The particular variants implemented by a particular tool have no important practical consequences and are (usually) easily understood.
I describe the four types here.
Given that there's so much confusion, what good is another set of explanations? Fair question. After the explanations, I describe how to use them.
If you want to use a coverage tool, you'll find the additional reading at the end helpful.
Here's some Java code I wrote recently (or, rather, cribbed from Wirth's Algorithms + Data Structures = Programs):
1: private Expr orArgument() throws ParseError { 2: Expr retval = factor(); 3: while (tokenType == AND) { 4: getsym(); 5: retval = new ExprAnd(retval, factor()); 6: } 7: tr.debugm(retval.toString()); 8: return retval; 9: }
I have a test suite for the application in which this code lives. Suppose I ran that test suite. A line coverage tool could then tell me which of these lines of code had never been reached by the execution of any of my tests.
"Line 4 in file Parser.java was never executed."
(If you're familiar with source-level debuggers, think of breakpoints. A line coverage tool essentially puts a breakpoint on every line of code. When the breakpoint is hit, it records that fact.)
It's good to know what lines haven't been executed. Maybe line 4 should read "factor()" instead of "getsym()". If I never execute it, I'll never have the chance to find out. (If I do execute it, I might still not find the bug, depending on the test, but at least I have a shot at it.)
In this example, once you know that line 4 hasn't been executed, you really don't care to know that line 5 hasn't been either. It's obvious. For that reason, some line coverage tools would not tell you about line 5 in that case. (But such a tool has to be careful not to assume too much: what if getsym() throws a ParseError? Then line 4 has been executed but line 5 has not.)
Most common programming languages aren't defined in terms of text lines. Instead, their reference manuals talk about statements. If you've reached every line of code in the program, have you exercised every statement? Not quite. Consider this code:
54: if (i < 0) i = 0;
There are two statements there: an IF statement and an assignment statement. A line coverage tool would tell you whether the IF had been reached, but I know of none that would tell you whether the assignment had. I believe that's for the same reason that most source-level debuggers won't let you set a breakpoint on the assignment: figuring out a workable user interface is more trouble than it's worth. If you care about such cases, you want branch coverage (the next type).
Consider:
... startup(); if (isFun(debating)) { debate(); debate(); debate(); shutdown(); } ...
If the IF statement is taken false, shutdown() will never be called. That's probably bad.
A line coverage tool won't tell you that you've never taken the ELSE branch because there's no code in the ELSE branch for it to measure.
So there's a type of coverage to help. It's usually called "branch coverage". It checks whether all IF statements have been taken in both the THEN and ELSE directions. Ditto for WHILE statements:
while (i < 33) { ... if (whatever) break; ... } post_loop_cleanup();
A branch coverage tool will tell you whether the WHILE's test has ever been false. That tells you whether you've ever stopped iterating because you "fell out of the loop". A line coverage tool could only tell you that you'd executed post_loop_cleanup(), not whether you got there because you fell out of the loop or because the internal break statement "broke out of the loop". Maybe you always broke out of the loop. In that case, you have less grounds to be confident that the WHILE test is right.
Ditto for FOR, DO/WHILE, SWITCH/CASE, the ? operator (in C-like languages), etc.
(Different coverage tools handle switches in different ways, but all tools I've seen that purport to do branch coverage do something with them.)
Except for some special cases, if you exercise all the branches, you'll exercise all the statements and all the lines. One special case is trivial: what if the program doesn't contain any branches? (If so, you shouldn't need a coverage tool to tell you to execute it at least once.) Another is not: what about an exception handler without any branches in its body? I hope that any tool that says it does branch coverage will also tell you about unentered exception handlers.
What about this:
if (a && b) {...}
Suppose you tested it with (a true, b true) and (a false, b true). In the first case, the THEN branch would be taken. In the second, the ELSE branch would be taken. A branch coverage tool would be satisfied.
But it seems odd that B has always been true. Those tests would have the same result if the statement were written like this:
if (a) {...}
It seems as if tests that can't even tell the difference between those two statements must be somehow lacking. For that reason, there are a variety of closely related forms of coverage that insist that every boolean condition evaluate to both true and false.
For example, one type of such coverage would be happy with these three tests:
(a true, b true)
(a true, b false)
(a false, b can have either value)
Whereas another would prefer all four possibilities:
(a true, b true)
(a true, b false)
(a false, b true)
(a false, b false)
If I had a choice, I'd prefer the first way. The second requires more work and doesn't (in my opinion) provide any benefit in terms of bug-finding power. But if I'm contractually required to do the second, I'm not going to argue about it. It's not that much extra work, not unless the code is chock-full of big ugly IF statements.
Supposedly, the variants of this type of coverage are well described and given unique names. In practice, people continue to be confused by the meanings of terms like "decision/condition" coverage. Either all those people are stupid and careless, or the definitions are inadequate. I believe the latter. In any case, don't trust that the tool vendor means the same thing by "decision/condition" coverage as you do. Check.
The above types of coverage are mainly useful to programmers. To know what to do when a coverage tool says "The code on line 34 has never been executed", you have to look at the code.
Suppose you're an independent tester (the type of person who tests big chunks of code, perhaps entire products, in the way a user would). How would you most likely use coverage?
You don't have time to look at line 34. Aggregate numbers (percentages) are of some use. If, for example, you measure the coverage of the whole test suite and discover that no lines of code from the entire subdirectory /src/yp have ever been executed, maybe you should think about adding some simple tests for that code. (I speak from bitter experience. Not mine - that of my once-officemate's.)
Or, if you look at the percent of coverage achieved for various source files, you may want to pay extra attention to files that have relatively low coverage. So, if most of the code in the product has 50% branch coverage, but absolutely-critical-code.c has only 10%, you might start asking what that code does and how to test it. But if the file that has 10% coverage is debugging-code-only.c, you're more likely to be happy with low coverage.
If this is the way you're using coverage, you might not want to bother with something like branch coverage. You're not using the detail. You might be happier just measuring which routines have not been entered. That makes it easy to interact with the programmers in the following way:
"I notice that file absolutely-critical-code.c hasn't been exercised much. Would you tell me what I need to know to write some tests? In particular, what do these unexercised routines do? - mumble_froz(), frobulate(), bogosify()..."
Another advantage of measuring routine entry coverage is that it doesn't slow the program down as much as other types. It also takes up less space in the running program. Given a competently implemented coverage tool, that shouldn't matter much (if at all) for most applications.
There are other types of code coverage. Some of them languish in well-deserved obscurity. Others would be useful, but aren't supported in coverage tools available to you. If your goal is to learn how to use coverage well, I'd much rather you got a tool and experimented with these four types than spend time now learning about additional types.
If you're an independent tester, you'd probably like a coverage tool to have routine entry coverage or some other type of coverage that reports in "chunks" bigger than a line. The most important thing is that the tool's reports be easy for programmers to understand. They should be able to flip through a report and say, "Aha... This missed coverage here means you must be missing tests that do X."
If you're shopping for a coverage tool that programmers are going to use, the first task may be to decipher the vendor's terminology. What the heck do they mean by "segment coverage" or "path coverage" or "basic block coverage"? Remember: it's not your job to decipher. It's theirs. Send mail to sales@covtoolvendor.com, give them this URL, and ask them to fit their tool's coverage into the first three categories.
After that, which type should you choose? Programmers seem to find line coverage tools easiest to understand. Condition coverage tools are more powerful. I lean toward the latter, but I frankly think that issues like the tool's documentation, support, flexibility, reliability and so on are more important than the type of coverage it implements.
There's a site devoted to coverage tools: http://www.codecoveragetools.com/.
Coverage tools are often misused. I wrote a paper about that: "How to misuse code coverage".
Cem Kaner's "Software negligence and testing coverage" talks about the relationship between responsible testing and coverage. It also highlights the fact that "coverage" is an idea that can be applied to more than code structure.
When I use coverage as a programmer, it occupies maybe 2% of my total coding, testing, and debugging effort. My paper "Experience with the cost of different coverage goals for testing" talks about cost and describes how I approach the use of coverage.
The "GCT Tutorial" is a decent example of the use of a coverage tool on a small program.